Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 336 results for author: Arora, S

.
  1. arXiv:2407.21009  [pdf, other

    cs.AI

    AI-Assisted Generation of Difficult Math Questions

    Authors: Vedant Shah, Dingli Yu, Kaifeng Lyu, Simon Park, Nan Rosemary Ke, Michael Mozer, Yoshua Bengio, Sanjeev Arora, Anirudh Goyal

    Abstract: Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet demand for diverse and challenging math questions. Relying solely on human experts is both time-consuming and costly, while LLM-generated questions often lack the requisite diversity and difficulty. We present a design framework that combines the strengths of LLM… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  2. arXiv:2407.07886  [pdf, ps, other

    math.OC

    Controllability problems of a neutral integro-differential equation with memory

    Authors: Sumit Arora, Akambadath Nandakumaran

    Abstract: The current study addresses the control problems posed by a semilinear neutral integro-differential equation with memory. The primary objectives of this study are to investigate the existence of a mild solution and approximate controllability of both linear and semilinear control systems in Banach spaces. To accomplish this, we begin by introducing the concept of a resolvent family associated with… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    MSC Class: 34K06; 34A12; 37L05; 93B05

  3. arXiv:2407.06037  [pdf, ps, other

    quant-ph

    Continuous variable quantum teleportation using photon subtracted and photon added two mode squeezed coherent state

    Authors: Shikhar Arora, Chandan Kumar, Arvind

    Abstract: We consider non-Gaussian states generated by photon subtraction (PS) and photon addition (PA) on two-mode squeezed coherent (TMSC) states, as resource states for continuous variable (CV) quantum teleportation (QT). To this end, we derive the Wigner characteristic function for the family of photon subtracted and photon added TMSC states, which is then utilized to calculate the fidelity of teleporti… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Comments are welcome

  4. arXiv:2407.05483  [pdf, other

    cs.CL cs.LG

    Just read twice: closing the recall gap for recurrent language models

    Authors: Simran Arora, Aman Timalsina, Aaryan Singhal, Benjamin Spector, Sabri Eyuboglu, Xinyi Zhao, Ashish Rao, Atri Rudra, Christopher Ré

    Abstract: Recurrent large language models that compete with Transformers in language modeling perplexity are emerging at a rapid rate (e.g., Mamba, RWKV). Excitingly, these architectures use a constant amount of memory during inference. However, due to the limited memory, recurrent LMs cannot recall and use all the information in long contexts leading to brittle in-context learning (ICL) quality. A key chal… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  5. arXiv:2407.03931  [pdf, other

    eess.IV cs.CV

    LeDNet: Localization-enabled Deep Neural Network for Multi-Label Radiography Image Classification

    Authors: Lalit Pant, Shubham Arora

    Abstract: Multi-label radiography image classification has long been a topic of interest in neural networks research. In this paper, we intend to classify such images using convolution neural networks with novel localization techniques. We will use the chest x-ray images to detect thoracic diseases for this purpose. For accurate diagnosis, it is crucial to train the network with good quality images. But man… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 6 pages, 7 figures

  6. arXiv:2407.00989  [pdf, other

    gr-qc hep-th

    $f(Q,L_m)$ gravity, and its cosmological implications

    Authors: Ayush Hazarika, Simran Arora, P. K. Sahoo, Tiberiu Harko

    Abstract: In the present work, we extend the $f(Q)$ symmetric teleparallel gravity by introducing an arbitrary coupling between the non-metricity $Q$ and matter Lagrangian $L_m$ in the Lagrangian density $f$ of the theory, which thus leads to the $f\left(Q,L_m\right)$ theory. This generalisation encompasses Coincident General Relativity (CGR) and the Symmetric Teleparallel Equivalent to GR (STEGR). Using th… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 21 pages, 11 figures

  7. arXiv:2406.18521  [pdf, other

    cs.CL cs.CV

    CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

    Authors: Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, Alexis Chevalier, Sanjeev Arora, Danqi Chen

    Abstract: Chart understanding plays a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. However, existing datasets often focus on oversimplified and homogeneous charts with template-based questions, leading to an over-optimistic measure of progress. We demonstrate that although open-source models can appear to ou… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 121 pages, 90 figures

  8. arXiv:2406.17761  [pdf, other

    cs.CL cs.AI cs.LG

    CaLMQA: Exploring culturally specific long-form question answering across 23 languages

    Authors: Shane Arora, Marzena Karpinska, Hung-Ting Chen, Ipsita Bhattacharjee, Mohit Iyyer, Eunsol Choi

    Abstract: Large language models (LLMs) are used for long-form question answering (LFQA), which requires them to generate paragraph-length answers to complex questions. While LFQA has been well-studied in English, this research has not been extended to other languages. To bridge this gap, we introduce CaLMQA, a collection of 1.5K complex culturally specific questions spanning 23 languages and 51 culturally a… ▽ More

    Submitted 3 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 39 pages, 17 figures. Code and data available at https://github.com/2015aroras/CaLMQA. Revised argument in section 4, results unchanged

  9. arXiv:2406.16107  [pdf, ps, other

    eess.AS cs.CL

    Decoder-only Architecture for Streaming End-to-end Speech Recognition

    Authors: Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

    Abstract: Decoder-only language models (LMs) have been successfully adopted for speech-processing tasks including automatic speech recognition (ASR). The LMs have ample expressiveness and perform efficiently. This efficiency is a suitable characteristic for streaming applications of ASR. In this work, we propose to use a decoder-only architecture for blockwise streaming ASR. In our approach, speech features… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted for Interspeech 2024

  10. arXiv:2406.12611  [pdf, other

    cs.SD cs.CL eess.AS

    Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting

    Authors: Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, Shinji Watanabe

    Abstract: End-to-end multilingual speech recognition models handle multiple languages through a single model, often incorporating language identification to automatically detect the language of incoming speech. Since the common scenario is where the language is already known, these models can perform as language-specific by using language information as prompts, which is particularly beneficial for attentio… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  11. arXiv:2406.12317  [pdf, other

    cs.CL eess.AS

    Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model

    Authors: Hayato Futami, Siddhant Arora, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe

    Abstract: Recently, multi-task spoken language understanding (SLU) models have emerged, designed to address various speech processing tasks. However, these models often rely on a large number of parameters. Also, they often encounter difficulties in adapting to new data for a specific task without experiencing catastrophic forgetting of previously trained tasks. In this study, we propose finding task-specif… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech2024

  12. arXiv:2406.10083  [pdf, other

    cs.CL cs.SD eess.AS

    On the Evaluation of Speech Foundation Models for Spoken Language Understanding

    Authors: Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe

    Abstract: The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking of complex spoken language understanding (SLU) tasks, including both classification and sequence generation tasks, on natural speech. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for th… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL Findings 2024

  13. arXiv:2406.05339  [pdf, other

    eess.AS cs.AI

    To what extent can ASV systems naturally defend against spoofing attacks?

    Authors: Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Sidhhant Arora, Junichi Yamagishi, Joon Son Chung

    Abstract: The current automatic speaker verification (ASV) task involves making binary decisions on two types of trials: target and non-target. However, emerging advancements in speech generation technology pose significant threats to the reliability of ASV systems. This study investigates whether ASV effortlessly acquires robustness against spoofing attacks (i.e., zero-shot capability) by systematically ex… ▽ More

    Submitted 14 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, 3 tables, Interspeech 2024

  14. arXiv:2406.02026  [pdf, other

    gr-qc

    Cosmological dynamics of interacting dark energy and dark matter in $f(Q)$ gravity

    Authors: Gaurav N. Gadbail, Simran Arora, Phongpichit Channuie, P. K. Sahoo

    Abstract: In this work, we explore the behavior of interacting dark energy and dark matter within a model of $f(Q)$ gravity, employing a standard framework of dynamical system analysis. We consider the power-law $f(Q)$ model incorporating with two different forms of interacting dark energy and dark matter: $3αHρ_m$ and $\fracα{3H}ρ_m ρ_{DE}$. The evolution of $Ω_m, Ω_r, Ω_{DE}, q$, and $ω$ for different val… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: v1: 14 pages, many figures

  15. arXiv:2406.01305  [pdf, ps, other

    math.GR

    Forbidden subgraphs of conjugacy class graphs of groups

    Authors: Papi Ray, Sonakshee Arora, Pallabi Manna

    Abstract: Let G be a finite group. The nilpotent/commuting/solvable conjugacy class graph is a simple graph with non-central conjugacy classes of G as its vertex set and two vertices are adjacent if and only if a member of one conjugacy class with a member of another conjugacy class generates a nilpotent/abelian/solvable (sub)group. In this paper have discussed about the forbidden subgraphs of nilpotent, co… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 15 pages

    MSC Class: 05C25

  16. arXiv:2406.00869  [pdf, other

    cs.RO

    Using 3-D LiDAR Data for Safe Physical Human-Robot Interaction

    Authors: Sarthak Arora, Karthik Subramanian, Odysseus Adamides, Ferat Sahin

    Abstract: This paper explores the use of 3D lidar in a physical Human-Robot Interaction (pHRI) scenario. To achieve the aforementioned, experiments were conducted to mimic a modern shop-floor environment. Data was collected from a pool of seventeen participants while performing pre-determined tasks in a shared workspace with the robot. To demonstrate an end-to-end case; a perception pipeline was developed t… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE-CASE 2024. Under Review

  17. arXiv:2405.16371  [pdf, ps, other

    math.FA math.SP

    Eventually positive semigroups: spectral and asymptotic analysis

    Authors: Sahiba Arora

    Abstract: The spectral theory of semigroup generators is a crucial tool for analysing the asymptotic properties of operator semigroups. Typically, Tauberian theorems, such as the ABLV theorem, demand extensive information about the spectrum to derive convergence results. However, the scenario is significantly simplified for positive semigroups on Banach lattices. This observation extends to the broader clas… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 26 pages

  18. arXiv:2405.12205  [pdf, other

    cs.AI cs.LG

    Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

    Authors: Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke, Siyuan Guo, Michal Valko, Timothy Lillicrap, Danilo Rezende, Yoshua Bengio, Michael Mozer, Sanjeev Arora

    Abstract: Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interac… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Preprint. Under review

  19. arXiv:2405.06787  [pdf, other

    quant-ph cs.CR

    A computational test of quantum contextuality, and even simpler proofs of quantumness

    Authors: Atul Singh Arora, Kishor Bharti, Alexandru Cojocaru, Andrea Coladangelo

    Abstract: Bell non-locality is a fundamental feature of quantum mechanics whereby measurements performed on "spatially separated" quantum systems can exhibit correlations that cannot be understood as revealing predetermined values. This is a special case of the more general phenomenon of "quantum contextuality", which says that such correlations can occur even when the measurements are not necessarily on se… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: 69 pages, 6 figures. For updates see https://atulsingharora.github.io/PoC

  20. arXiv:2405.00201  [pdf, other

    cs.CL cs.AI

    SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models

    Authors: Samir Arora, Liangliang Wang

    Abstract: Full fine-tuning is a popular approach to adapt Transformer-based pre-trained large language models to a specific downstream task. However, the substantial requirements for computational power and storage have discouraged its widespread use. Moreover, increasing evidence of catastrophic forgetting and overparameterization in the Transformer architecture has motivated researchers to seek more effic… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  21. arXiv:2404.17079  [pdf, other

    quant-ph cs.CR

    Improving device-independent weak coin flipping protocols

    Authors: Atul Singh Arora, Jamie Sikora, Thomas Van Himbeeck

    Abstract: Weak coin flipping is the cryptographic task where Alice and Bob remotely flip a coin but want opposite outcomes. This work studies this task in the device-independent regime where Alice and Bob neither trust each other, nor their quantum devices. The best protocol was devised over a decade ago by Silman, Chailloux, Aharon, Kerenidis, Pironio, and Massar with bias $\varepsilon \approx 0.33664$, wh… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 25 pages, 7 figures

  22. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  23. PDS 70 unveiled by star-hopping: total intensity, polarimetry and mm-imaging modeled in concert

    Authors: Z. Wahhaj, M. Benisty, C. Ginski, C. Swastik, S. Arora, R. G. van Holstein, R. J. De Rosa, B. Yang, J. Bae, B. Ren

    Abstract: Context. Most ground-based planet search direct imaging campaigns use angular differential imaging, which distorts the signal from extended sources like protoplanetary disks. In the case PDS 70, a young system with two planets found within the cavity of a protoplanetary disk, obtaining a reliable image of both planets and disk is essential to understanding planet-disk interactions. Aims. Our goals… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted to A&A on April 11, 2024. 20 pages, 19 figures

    Journal ref: A&A 687, A257 (2024)

  24. arXiv:2404.02116  [pdf, ps, other

    math.FA

    The lattice structure of negative Sobolev and extrapolation spaces

    Authors: Sahiba Arora, Jochen Glück, Felix L. Schwenninger

    Abstract: It is well-known that the Sobolev spaces $W^{k,p}(\mathbb R^d)$ are vector lattices with respect to the pointwise almost everywhere order if $k \in \{0,1\}$, but not if $k \ge 2$. In this note, we consider negative $k$ and show that the span of the positive cone in $W^{k,p}(\mathbb R^d)$ is a vector lattice in this case. We also prove a related abstract result: if $(T(t))_{t \in [0,\infty)}$ is… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 9 pages

    MSC Class: 46B40; 46B42; 46E35

  25. Limit-case admissibility for positive infinite-dimensional systems

    Authors: Sahiba Arora, Jochen Glück, Lassi Paunonen, Felix L. Schwenninger

    Abstract: In the context of positive infinite-dimensional linear systems, we systematically study $L^p$-admissible control and observation operators with respect to the limit-cases $p=\infty$ and $p=1$, respectively. This requires an in-depth understanding of the order structure on the extrapolation space $X_{-1}$, which we provide. These properties of $X_{-1}$ also enable us to discuss when zero-class admi… ▽ More

    Submitted 4 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: 30 pages, corrected minor typos, added Remark 3.6, minor correction to Example 5.2

  26. arXiv:2403.09603  [pdf, other

    cs.CR cs.AI cs.LG

    Optimistic Verifiable Training by Controlling Hardware Nondeterminism

    Authors: Megha Srivastava, Simran Arora, Dan Boneh

    Abstract: The increasing compute demands of AI systems has led to the emergence of services that train models on behalf of clients lacking necessary resources. However, ensuring correctness of training and guarding against potential training-time attacks, such as data poisoning, poses challenges. Existing works on verifiable training largely fall into two classes: proof-based systems, which struggle to scal… ▽ More

    Submitted 16 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 11 pages, 5 figures, preprint

  27. arXiv:2403.00887  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech

    Authors: Aron R, Indra Sigicharla, Chirag Periwal, Mohanaprasad K, Nithya Darisini P S, Sourabh Tiwari, Shivani Arora

    Abstract: The interpretation of human voices holds importance across various applications. This study ventures into predicting age, gender, and emotion from vocal cues, a field with vast applications. Voice analysis tech advancements span domains, from improving customer interactions to enhancing healthcare and retail experiences. Discerning emotions aids mental health, while age and gender detection are vi… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  28. arXiv:2402.18668  [pdf, other

    cs.CL cs.LG

    Simple linear attention language models balance the recall-throughput tradeoff

    Authors: Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, Dylan Zinsley, James Zou, Atri Rudra, Christopher Ré

    Abstract: Recent work has shown that attention-based language models excel at recall, the ability to ground generations in tokens previously seen in context. However, the efficiency of attention-based models is bottle-necked during inference by the KV-cache's aggressive memory consumption. In this work, we explore whether we can improve language model efficiency (e.g. by reducing memory consumption) without… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  29. arXiv:2402.18540  [pdf, other

    cs.LG cs.AI cs.CL

    Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

    Authors: Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev Arora

    Abstract: Public LLMs such as the Llama 2-Chat have driven huge activity in LLM research. These models underwent alignment training and were considered safe. Recently Qi et al. (2023) reported that even benign fine-tuning (e.g., on seemingly safe datasets) can give rise to unsafe behaviors in the models. The current paper is about methods and best practices to mitigate such loss of alignment. Through extens… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 20 pages

  30. Energy conditions in the $f(R,L,T)$ theory of gravity

    Authors: Simran Arora, P. H. R. S. Moraes, P. K. Sahoo

    Abstract: We construct the energy conditions for the recently proposed $f(R,L,T)$ gravity theory, for which $f$ is a generic function of the Ricci scalar $R$, matter lagrangian density $L$ and trace of the energy-momentum tensor $T$. We analyse two different forms for the $f(R,L,T)$ function within the framework of the Friedmann-Lemâitre-Robertson-Walker universe. We constrain the model parameters from the… ▽ More

    Submitted 5 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: EPJ Plus accepted version

    Journal ref: European Physical Journal Plus, 139(6) (2024) 542

  31. arXiv:2402.16021  [pdf, other

    cs.CL cs.AI cs.CV eess.AS

    TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

    Authors: Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro

    Abstract: The capability to jointly process multi-modal information is becoming an essential task. However, the limited number of paired multi-modal data and the large computational requirements in multi-modal learning hinder the development. We propose a novel Tri-Modal Translation (TMT) model that translates between arbitrary modalities spanning speech, image, and text. We introduce a novel viewpoint, whe… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  32. arXiv:2402.15855  [pdf, other

    quant-ph cs.CR

    Protocols for Quantum Weak Coin Flipping

    Authors: Atul Singh Arora, Jérémie Roland, Chrysoula Vlachou, Stephan Weis

    Abstract: Weak coin flipping is an important cryptographic primitive -- it is the strongest known secure two-party computation primitive that classically becomes secure only under certain assumptions (e.g. computational hardness), while quantumly there exist protocols that achieve arbitrarily close to perfect security. This breakthrough result was established by Mochon in 2007 [arXiv:0711.4114]. However, hi… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 51 pages (+ 9 appendix), 12 figures. This is a self-contained, concise version of our main results in arXiv:1811.02984 (STOC '19) and arXiv:1911.13283v2 (SODA '21). The Cryptology ePrint 2022/1101 is the comprehensive version, subsuming the above

  33. arXiv:2402.11111  [pdf, other

    cs.CL

    Language Models as Science Tutors

    Authors: Alexis Chevalier, Jiayi Geng, Alexander Wettig, Howard Chen, Sebastian Mizera, Toni Annala, Max Jameson Aragon, Arturo Rodríguez Fanlo, Simon Frieder, Simon Machado, Akshara Prabhakar, Ellie Thieu, Jiachen T. Wang, Zirui Wang, Xindi Wu, Mengzhou Xia, Wenhan Xia, Jiatong Yu, Jun-Jie Zhu, Zhiyong Jason Ren, Sanjeev Arora, Danqi Chen

    Abstract: NLP has recently made exciting progress toward training language models (LMs) with strong scientific problem-solving skills. However, model development has not focused on real-life use-cases of LMs for science, including applications in education that require processing long scientific documents. To address this, we introduce TutorEval and TutorChat. TutorEval is a diverse question-answering bench… ▽ More

    Submitted 21 July, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 8 pages without bibliography and appendix, 26 pages total

  34. arXiv:2402.07440  [pdf, other

    cs.IR cs.LG

    Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT

    Authors: Jon Saad-Falcon, Daniel Y. Fu, Simran Arora, Neel Guha, Christopher Ré

    Abstract: Retrieval pipelines-an integral component of many machine learning systems-perform poorly in domains where documents are long (e.g., 10K tokens or more) and where identifying the relevant document requires synthesizing information across the entire text. Developing long-context retrieval encoders suitable for these domains raises three challenges: (1) how to evaluate long-context retrieval perform… ▽ More

    Submitted 13 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  35. Reconstruction of the singularity-free $f(\mathcal{R})$ gravity via Raychaudhuri equations

    Authors: Gaurav N. Gadbail, Simran Arora, P. K. Sahoo, Kazuharu Bamba

    Abstract: We study the bounce cosmology to construct a singularity-free $f(\mathcal{R})$ model using the reconstruction technique. The formulation of the $f(\mathcal{R})$ model is based on the Raychaudhari equation, a key element employed in reconstructed models to eliminate singularities. We explore the feasibility of obtaining stable gravitational Lagrangians, adhering to the conditions… ▽ More

    Submitted 27 July, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: EPJC published version

    Journal ref: Eur. Phys. J. C 84(7), (2024) 752

  36. arXiv:2402.04333  [pdf, other

    cs.CL cs.AI cs.LG

    LESS: Selecting Influential Data for Targeted Instruction Tuning

    Authors: Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, Danqi Chen

    Abstract: Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots. However, real-world applications often require a specialized suite of skills (e.g., reasoning). The challenge lies in identifying the most relevant data from these extensive datasets to effectively develop specific capabilities, a setting we… ▽ More

    Submitted 12 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: ICML 2024; Code and data are available at https://github.com/princeton-nlp/LESS

  37. arXiv:2402.00838  [pdf, other

    cs.CL

    OLMo: Accelerating the Science of Language Models

    Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

    Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  38. arXiv:2401.16658  [pdf, ps, other

    cs.CL eess.AS

    OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

    Authors: Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe

    Abstract: Recent studies have highlighted the importance of fully open foundation models. The Open Whisper-style Speech Model (OWSM) is an initial step towards reproducing OpenAI Whisper using public data and open-source toolkits. However, previous versions of OWSM (v1 to v3) are still based on standard Transformer, which might lead to inferior performance compared to state-of-the-art speech encoder archite… ▽ More

    Submitted 16 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted at INTERSPEECH 2024. Webpage: https://www.wavlab.org/activities/2024/owsm/

  39. arXiv:2401.08520  [pdf, other

    cs.CR cs.CE

    SecPLF: Secure Protocols for Loanable Funds against Oracle Manipulation Attacks

    Authors: Sanidhay Arora, Yingjiu Li, Yebo Feng, Jiahua Xu

    Abstract: The evolving landscape of Decentralized Finance (DeFi) has raised critical security concerns, especially pertaining to Protocols for Loanable Funds (PLFs) and their dependency on price oracles, which are susceptible to manipulation. The emergence of flash loans has further amplified these risks, enabling increasingly complex oracle manipulation attacks that can lead to significant financial losses… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  40. arXiv:2401.00353  [pdf, other

    cs.IR

    EXPLORE -- Explainable Song Recommendation

    Authors: Abhinav Arun, Mehul Soni, Palash Choudhary, Saksham Arora

    Abstract: This study explores the development of an explainable music recommendation system with enhanced user control. Leveraging a hybrid of collaborative filtering and content-based filtering, we address the challenges of opaque recommendation logic and lack of user influence on results. We present a novel approach combining advanced algorithms and an interactive user interface. Our methodology integrate… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: 6 pages, 7 figures

  41. arXiv:2401.00054  [pdf, other

    gr-qc

    Late Time Acceleration with Observational Constraints in Modified Theories of Gravity

    Authors: Simran Arora

    Abstract: The late time acceleration of the Universe has challenged contemporary cosmology since its discovery. General Relativity explains this phenomenon by introducing the cosmological constant, named the standard cosmological model ($Λ$CDM). However, the cosmological constant solution has several drawbacks that have led cosmologists to explore and propose alternative models to explain the late time acce… ▽ More

    Submitted 29 December, 2023; originally announced January 2024.

    Comments: Ph.D. Thesis. Some chapters are published in the following journals: Classical and Quantum Gravity 37, 205022 (2020); Monthly Notices of the Royal Astronomical Society 522, 252-267 (2023); Physics of the Dark Universe 30, 100664 (2020); European Physics Journal C 81, 555 (2021); 133 Pages

  42. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  43. arXiv:2312.09582  [pdf, other

    cs.CL cs.SD eess.AS

    Phoneme-aware Encoding for Prefix-tree-based Contextual ASR

    Authors: Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, Shinji Watanabe

    Abstract: In speech recognition applications, it is important to recognize context-specific rare words, such as proper nouns. Tree-constrained Pointer Generator (TCPGen) has shown promise for this purpose, which efficiently biases such words with a prefix tree. While the original TCPGen relies on grapheme-based encoding, we propose extending it with phoneme-aware encoding to better recognize words of unusua… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted to ICASSP2024

  44. Revisiting kink-like parametrization and constraints using OHD/Pantheon+/BAO samples

    Authors: Simran Arora, P. K. Sahoo

    Abstract: We reexamine the kink-like parameterization of the deceleration parameter to derive constraints on the transition redshift from cosmic deceleration to acceleration. This is achieved using observational Hubble data, Type Ia Supernovae Pantheon+ samples and Baryon acoustic oscillations. In this parametrization, the value of the initial $q$ parameter is $q_{i}$, the final value is $q_f$, the present… ▽ More

    Submitted 29 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Physics of the Dark Universe published version

    Journal ref: Physics of the Dark Universe 45 (2024) 101510

  45. arXiv:2312.04927  [pdf, other

    cs.CL cs.LG

    Zoology: Measuring and Improving Recall in Efficient Language Models

    Authors: Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael Poli, James Zou, Atri Rudra, Christopher Ré

    Abstract: Attention-free language models that combine gating and convolutions are growing in popularity due to their efficiency and increasingly competitive performance. To better understand these architectures, we pretrain a suite of 17 attention and "gated-convolution" language models, finding that SoTA gated-convolution architectures still underperform attention by up to 2.1 perplexity points on the Pile… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  46. RELIC: Investigating Large Language Model Responses using Self-Consistency

    Authors: Furui Cheng, Vilém Zouhar, Simran Arora, Mrinmaya Sachan, Hendrik Strobelt, Mennatallah El-Assady

    Abstract: Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations. To address this challenge, we propose an interactive system that helps users gain insight into the reliability of the generated text. Our approach is based on the idea that the self-consistency of multiple samples generated by the same LLM relates to its confidence… ▽ More

    Submitted 4 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  47. arXiv:2311.15268  [pdf, other

    cs.LG cs.AI

    Unlearning via Sparse Representations

    Authors: Vedant Shah, Frederik Träuble, Ashish Malik, Hugo Larochelle, Michael Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal

    Abstract: Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's p… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  48. arXiv:2310.17567  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models

    Authors: Dingli Yu, Simran Kaur, Arushi Gupta, Jonah Brown-Cohen, Anirudh Goyal, Sanjeev Arora

    Abstract: With LLMs shifting their role from statistical modeling of language to serving as general-purpose AI agents, how should LLM evaluations change? Arguably, a key ability of an AI agent is to flexibly combine, as needed, the basic skills it has learned. The capability to combine skills plays an important role in (human) pedagogy and also in a paper on emergence phenomena (Arora & Goyal, 2023). This… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  49. arXiv:2310.17053  [pdf, other

    math-ph

    Invariant Physics-Informed Neural Networks for Ordinary Differential Equations

    Authors: Shivam Arora, Alex Bihlo, Francis Valiquette

    Abstract: Physics-informed neural networks have emerged as a prominent new method for solving differential equations. While conceptually straightforward, they often suffer training difficulties that lead to relatively large discretization errors or the failure to obtain correct solutions. In this paper we introduce invariant physics-informed neural networks for ordinary differential equations that admit a f… ▽ More

    Submitted 11 March, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: 23 pages, 6 figures

    MSC Class: 58D19

  50. arXiv:2310.14423  [pdf, other

    cs.LG

    A Quadratic Synchronization Rule for Distributed Deep Learning

    Authors: Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang

    Abstract: In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models. Local gradient methods, such as Local SGD, address this issue by allowing workers to compute locally for $H$ steps without synchronizing with others, hence reducing communication frequency. While… ▽ More

    Submitted 12 April, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: camera-ready version for ICLR'24