Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 134 results for author: Subramanian, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16782  [pdf, other

    cs.LG

    Confidence Aware Inverse Constrained Reinforcement Learning

    Authors: Sriram Ganapathi Subramanian, Guiliang Liu, Mohammed Elmahgiubi, Kasra Rezaee, Pascal Poupart

    Abstract: In coming up with solutions to real-world problems, humans implicitly adhere to constraints that are too numerous and complex to be specified completely. However, reinforcement learning (RL) agents need these constraints to learn the correct optimal policy in these settings. The field of Inverse Constraint Reinforcement Learning (ICRL) deals with this problem and provides algorithms that aim to es… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Paper to appear in ICML 2024

  2. arXiv:2406.11704  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-4 340B Technical Report

    Authors: Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek , et al. (58 additional authors not shown)

    Abstract: We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2406.10276  [pdf, other

    cs.CL cs.SD eess.AS

    Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

    Authors: Peidong Wang, Jian Xue, Jinyu Li, Junkun Chen, Aswin Shanmugam Subramanian

    Abstract: Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In some cases, the input language can be given or estimated. Our goal is to use this additional language information while preserving the quality of the o… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  4. arXiv:2406.06459  [pdf, other

    cs.LG

    How Useful is Intermittent, Asynchronous Expert Feedback for Bayesian Optimization?

    Authors: Agustinus Kristiadi, Felix Strieth-Kalthoff, Sriram Ganapathi Subramanian, Vincent Fortuin, Pascal Poupart, Geoff Pleiss

    Abstract: Bayesian optimization (BO) is an integral part of automated scientific discovery -- the so-called self-driving lab -- where human inputs are ideally minimal or at least non-blocking. However, scientists often have strong intuition, and thus human feedback is still useful. Nevertheless, prior works in enhancing BO with expert feedback, such as by incorporating it in an offline or online but blockin… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: AABI 2024. Code: https://github.com/wiseodd/bo-async-feedback

  5. arXiv:2405.20935  [pdf, other

    cs.LG cs.AI

    Effective Interplay between Sparsity and Quantization: From Theory to Practice

    Authors: Simla Burcu Harma, Ayan Chakraborty, Elizaveta Kostenok, Danila Mishin, Dongho Ha, Babak Falsafi, Martin Jaggi, Ming Liu, Yunho Oh, Suvinay Subramanian, Amir Yazdanbakhsh

    Abstract: The increasing size of deep neural networks necessitates effective model compression to improve computational efficiency and reduce their memory footprint. Sparsity and quantization are two prominent compression methods that have individually demonstrated significant reduction in computational and memory footprints while preserving model accuracy. While effective, the interplay between these two m… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  6. arXiv:2405.16129  [pdf, other

    cs.CL

    iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers

    Authors: Harshit Gupta, Manav Chaudhary, Tathagata Raha, Shivansh Subramanian, Vasudeva Varma

    Abstract: This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense. The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities. It consists of Sentence Puzzle and Word Puzzle subtasks that require models to defy default common-sense associations and exhibit unconventional thinking. We propo… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  7. arXiv:2405.03689  [pdf, other

    cs.CV cs.CL

    Pose Priors from Language Models

    Authors: Sanjay Subramanian, Evonne Ng, Lea Müller, Dan Klein, Shiry Ginosar, Trevor Darrell

    Abstract: We present a zero-shot pose optimization method that enforces accurate physical contact constraints when estimating the 3D pose of humans. Our central insight is that since language is often used to describe physical interaction, large pretrained text-based models can act as priors on pose estimation. We can thus leverage this insight to improve pose estimation by converting natural language des… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  8. arXiv:2404.19630  [pdf, other

    cs.LG

    Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction

    Authors: Jared D. Willard, Peter Harrington, Shashank Subramanian, Ankur Mahesh, Travis A. O'Brien, William D. Collins

    Abstract: The rapid rise of deep learning (DL) in numerical weather prediction (NWP) has led to a proliferation of models which forecast atmospheric variables with comparable or superior skill than traditional physics-based NWP. However, among these leading DL models, there is a wide variance in both the training settings and architecture used. Further, the lack of thorough ablation studies makes it hard to… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 9 pages, 6 figures

    MSC Class: 68T07; 86A10 ACM Class: J.2; I.2.6

    Journal ref: 23rd Conference on Artificial Intelligence for Environmental Science. Jan 2024. Abstract #437874

  9. arXiv:2404.01476  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    TraveLER: A Multi-LMM Agent Framework for Video Question-Answering

    Authors: Chuyi Shang, Amos You, Sanjay Subramanian, Trevor Darrell, Roei Herzig

    Abstract: Recently, Large Multimodal Models (LMMs) have made significant progress in video question-answering using a frame-wise approach by leveraging large-scale, image-based pretraining in a zero-shot manner. While image-based methods for videos have shown impressive performance, a current limitation is that they often overlook how key timestamps are selected and cannot adjust when incorrect timestamps a… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  10. arXiv:2402.16819  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-4 15B Technical Report

    Authors: Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Mostofa Patwary, Sandeep Subramanian, Dan Su, Chen Zhu, Deepak Narayanan, Aastha Jhunjhunwala, Ayush Dattagupta, Vibhu Jawa, Jiwei Liu, Ameya Mahabaleshwarkar, Osvald Nitski, Annika Brundyn, James Maki, Miguel Martinez, Jiaxuan You, John Kamalu, Patrick LeGresley, Denys Fridman, Jared Casper, Ashwath Aithal, Oleksii Kuchaiev, Mohammad Shoeybi , et al. (2 additional authors not shown)

    Abstract: We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remai… ▽ More

    Submitted 27 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  11. arXiv:2402.15734  [pdf, other

    cs.LG stat.ML

    Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning

    Authors: Wuyang Chen, Jialin Song, Pu Ren, Shashank Subramanian, Dmitriy Morozov, Michael W. Mahoney

    Abstract: Recent years have witnessed the promise of coupling machine learning methods and physical domainspecific insights for solving scientific problems based on partial differential equations (PDEs). However, being data-intensive, these methods still require a large amount of PDE data. This reintroduces the need for expensive numerical PDE solutions, partially undermining the original goal of avoiding t… ▽ More

    Submitted 13 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  12. arXiv:2402.04744  [pdf, other

    cs.LG cs.AR

    Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers

    Authors: Abhimanyu Rajeshkumar Bambhaniya, Amir Yazdanbakhsh, Suvinay Subramanian, Sheng-Chun Kao, Shivani Agrawal, Utku Evci, Tushar Krishna

    Abstract: N:M Structured sparsity has garnered significant interest as a result of relatively modest overhead and improved efficiency. Additionally, this form of sparsity holds considerable appeal for reducing the memory footprint owing to their modest representation overhead. There have been efforts to develop training recipes for N:M structured sparsity, they primarily focus on low-sparsity regions (… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 18 pages, 8 figures, 17 tables. Code is available at https://github.com/abhibambhaniya/progressive_gradient_flow_nm_sparsity

  13. arXiv:2402.02006  [pdf, other

    cs.LG

    PresAIse, A Prescriptive AI Solution for Enterprises

    Authors: Wei Sun, Scott McFaddin, Linh Ha Tran, Shivaram Subramanian, Kristjan Greenewald, Yeshi Tenzin, Zack Xue, Youssef Drissi, Markus Ettl

    Abstract: Prescriptive AI represents a transformative shift in decision-making, offering causal insights and actionable recommendations. Despite its huge potential, enterprise adoption often faces several challenges. The first challenge is caused by the limitations of observational data for accurate causal inference which is typically a prerequisite for good decision-making. The second pertains to the inter… ▽ More

    Submitted 12 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: 14 pages

  14. arXiv:2401.04088  [pdf, other

    cs.LG cs.CL

    Mixtral of Experts

    Authors: Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix , et al. (1 additional authors not shown)

    Abstract: We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected e… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: See more details at https://mistral.ai/news/mixtral-of-experts/

  15. arXiv:2312.02249  [pdf, other

    cs.CV cs.CL

    Recursive Visual Programming

    Authors: Jiaxin Ge, Sanjay Subramanian, Baifeng Shi, Roei Herzig, Trevor Darrell

    Abstract: Visual Programming (VP) has emerged as a powerful framework for Visual Question Answering (VQA). By generating and executing bespoke code for each question, these methods demonstrate impressive compositional and reasoning capabilities, especially in few-shot and zero-shot scenarios. However, existing VP methods generate all code in a single function, resulting in code that is suboptimal in terms o… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  16. arXiv:2311.12391  [pdf, other

    cs.CV

    From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation

    Authors: Jiaxin Ge, Sanjay Subramanian, Trevor Darrell, Boyi Li

    Abstract: Addressing the challenge of adapting pre-trained vision-language models for generating insightful explanations for visual reasoning tasks with limited annotations, we present ReVisE: a $\textbf{Re}$cursive $\textbf{Vis}$ual $\textbf{E}$xplanation algorithm. Our method iteratively computes visual features (conditioned on the text input), an answer, and an explanation, to improve the explanation qua… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023 Main

  17. arXiv:2311.05529  [pdf, other

    quant-ph cs.CC cs.IT cs.LG

    Information-theoretic generalization bounds for learning from quantum data

    Authors: Matthias Caro, Tom Gur, Cambyse Rouzé, Daniel Stilck França, Sathyawageeswar Subramanian

    Abstract: Learning tasks play an increasingly prominent role in quantum information and computation. They range from fundamental problems such as state discrimination and metrology over the framework of quantum probably approximately correct (PAC) learning, to the recently proposed shadow variants of state tomography. However, the many directions of quantum learning theory have so far evolved separately. We… ▽ More

    Submitted 18 June, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: 48+14 pages, 4 figures

  18. arXiv:2310.12183  [pdf, other

    math.OC cs.AI

    An Optimistic-Robust Approach for Dynamic Positioning of Omnichannel Inventories

    Authors: Pavithra Harsha, Shivaram Subramanian, Ali Koc, Mahesh Ramakrishna, Brian Quanz, Dhruv Shah, Chandra Narayanaswami

    Abstract: We introduce a new class of data-driven and distribution-free optimistic-robust bimodal inventory optimization (BIO) strategy to effectively allocate inventory across a retail chain to meet time-varying, uncertain omnichannel demand. While prior Robust optimization (RO) methods emphasize the downside, i.e., worst-case adversarial demand, BIO also considers the upside to remain resilient like RO wh… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  19. arXiv:2310.10806  [pdf, other

    eess.IV cs.CV cs.LG

    Convolutional Neural Network Model for Diabetic Retinopathy Feature Extraction and Classification

    Authors: Sharan Subramanian, Leilani H. Gilpin

    Abstract: The application of Artificial Intelligence in the medical market brings up increasing concerns but aids in more timely diagnosis of silent progressing diseases like Diabetic Retinopathy. In order to diagnose Diabetic Retinopathy (DR), ophthalmologists use color fundus images, or pictures of the back of the retina, to identify small distinct features through a difficult and time-consuming process.… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 9 pages, 2 tables, 5 figures

  20. arXiv:2310.03025  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Retrieval meets Long Context Large Language Models

    Authors: Peng Xu, Wei Ping, Xianchao Wu, Lawrence McAfee, Chen Zhu, Zihan Liu, Sandeep Subramanian, Evelina Bakhturina, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: Extending the context window of large language models (LLMs) is getting popular recently, while the solution of augmenting LLMs with retrieval has existed for years. The natural questions are: i) Retrieval-augmentation versus long context window, which one is better for downstream tasks? ii) Can both methods be combined to get the best of both worlds? In this work, we answer these questions by stu… ▽ More

    Submitted 23 January, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Published at ICLR 2024

  21. arXiv:2308.10897  [pdf, other

    cs.CV

    Can Language Models Learn to Listen?

    Authors: Evonne Ng, Sanjay Subramanian, Dan Klein, Angjoo Kanazawa, Trevor Darrell, Shiry Ginosar

    Abstract: We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words. Given an input transcription of the speaker's words with their timestamps, our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE. Since gesture is a language component, we propose t… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: ICCV 2023; Project page: https://people.eecs.berkeley.edu/~evonne_ng/projects/text2listen/

  22. arXiv:2307.15615  [pdf, other

    eess.IV cs.CV

    A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond

    Authors: Junyu Chen, Yihao Liu, Shuwen Wei, Zhangxing Bian, Shalini Subramanian, Aaron Carass, Jerry L. Prince, Yong Du

    Abstract: Deep learning technologies have dramatically reshaped the field of medical image registration over the past decade. The initial developments, such as regression-based and U-Net-based networks, established the foundation for deep learning in image registration. Subsequent progress has been made in various aspects of deep learning-based registration, including similarity measures, deformation regula… ▽ More

    Submitted 30 April, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

    Comments: A list of open-sourced code from the papers reviewed has been organized and is available at https://bit.ly/3QgFJ9z

  23. arXiv:2306.14070  [pdf, other

    cs.CV eess.IV physics.comp-ph

    SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning

    Authors: Pu Ren, N. Benjamin Erichson, Shashank Subramanian, Omer San, Zarija Lukic, Michael W. Mahoney

    Abstract: Super-Resolution (SR) techniques aim to enhance data resolution, enabling the retrieval of finer details, and improving the overall quality and fidelity of the data representation. There is growing interest in applying SR methods to complex spatiotemporal systems within the Scientific Machine Learning (SciML) community, with the hope of accelerating numerical simulations and/or improving forecasts… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

  24. arXiv:2306.10619  [pdf, other

    cs.LG math.NA physics.flu-dyn

    Towards Stability of Autoregressive Neural Operators

    Authors: Michael McCabe, Peter Harrington, Shashank Subramanian, Jed Brown

    Abstract: Neural operators have proven to be a promising approach for modeling spatiotemporal systems in the physical sciences. However, training these models for large systems can be quite challenging as they incur significant computational and memory expense -- these systems are often forced to rely on autoregressive time-stepping of the neural network to predict future temporal states. While this is effe… ▽ More

    Submitted 10 December, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

    Journal ref: Transactions on Machine Learning Research. November 2023

  25. arXiv:2306.05392  [pdf, other

    cs.CL

    Modular Visual Question Answering via Code Generation

    Authors: Sanjay Subramanian, Medhini Narasimhan, Kushal Khangaonkar, Kevin Yang, Arsha Nagrani, Cordelia Schmid, Andy Zeng, Trevor Darrell, Dan Klein

    Abstract: We present a framework that formulates visual question answering as modular code generation. In contrast to prior work on modular approaches to VQA, our approach requires no additional training and relies on pre-trained language models (LMs), visual models pre-trained on image-caption pairs, and fifty VQA examples used for in-context learning. The generated Python programs invoke and compose the o… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: ACL 2023

  26. arXiv:2306.00258  [pdf, other

    cs.LG math.NA

    Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior

    Authors: Shashank Subramanian, Peter Harrington, Kurt Keutzer, Wahid Bhimji, Dmitriy Morozov, Michael Mahoney, Amir Gholami

    Abstract: Pre-trained machine learning (ML) models have shown great performance for a wide range of applications, in particular in natural language processing (NLP) and computer vision (CV). Here, we study how pre-training could be used for scientific machine learning (SciML) applications, specifically in the context of transfer learning. We study the transfer behavior of these models as (i) the pre-trained… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: 16 pages, 11 figures

    Journal ref: NeurIPS 2023

  27. arXiv:2305.14177  [pdf, other

    cs.LG physics.chem-ph

    ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry

    Authors: Chris Beeler, Sriram Ganapathi Subramanian, Kyle Sprague, Nouha Chatti, Colin Bellinger, Mitchell Shahen, Nicholas Paquin, Mark Baula, Amanuel Dawit, Zihan Yang, Xinkai Li, Mark Crowley, Isaac Tamblyn

    Abstract: This paper provides a simulated laboratory for making use of Reinforcement Learning (RL) for chemical discovery. Since RL is fairly data intensive, training agents `on-the-fly' by taking actions in the real world is infeasible and possibly dangerous. Moreover, chemical processing and discovery involves challenges which are not commonly found in RL benchmarks and therefore offer a rich space to wor… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 19 pages, 13 figures, 2 tables

  28. arXiv:2304.14082  [pdf, other

    cs.LG cs.SE

    JaxPruner: A concise library for sparsity research

    Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci

    Abstract: This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the… ▽ More

    Submitted 18 December, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Jaxpruner is hosted at http://github.com/google-research/jaxpruner

  29. arXiv:2304.01433  [pdf

    cs.AR cs.AI cs.LG cs.PF

    TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings

    Authors: Norman P. Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, Cliff Young, Xiang Zhou, Zongwei Zhou, David Patterson

    Abstract: In response to innovations in machine learning (ML) models, production workloads changed radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its third supercomputer for such ML models. Optical circuit switches (OCSes) dynamically reconfigure its interconnect topology to improve scale, availability, utilization, modularity, deployment, security, power, and perfo… ▽ More

    Submitted 20 April, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: 15 pages; 16 figures; to be published at ISCA 2023 (the International Symposium on Computer Architecture)

  30. arXiv:2303.17951  [pdf, other

    cs.LG

    FP8 versus INT8 for efficient deep learning inference

    Authors: Mart van Baalen, Andrey Kuzmin, Suparna S Nair, Yuwei Ren, Eric Mahurin, Chirag Patel, Sundar Subramanian, Sanghyuk Lee, Markus Nagel, Joseph Soriaga, Tijmen Blankevoort

    Abstract: Recently, the idea of using FP8 as a number format for neural network training has been floating around the deep learning world. Given that most training is currently conducted with entire networks in FP32, or sometimes FP16 with mixed-precision, the step to having some parts of a network run in FP8 with 8-bit weights is an appealing potential speed-up for the generally costly and time-intensive t… ▽ More

    Submitted 15 June, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

  31. arXiv:2303.12308  [pdf, other

    cs.CL

    XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages

    Authors: Dhaval Taunk, Shivprasad Sagare, Anupam Patil, Shivansh Subramanian, Manish Gupta, Vasudeva Varma

    Abstract: Lack of encyclopedic text contributors, especially on Wikipedia, makes automated text generation for low resource (LR) languages a critical problem. Existing work on Wikipedia text generation has focused on English only where English reference articles are summarized to generate English Wikipedia pages. But, for low-resource languages, the scarcity of reference articles makes monolingual summariza… ▽ More

    Submitted 18 April, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

  32. arXiv:2303.10271  [pdf, other

    cs.AR

    VPU-EM: An Event-based Modeling Framework to Evaluate NPU Performance and Power Efficiency at Scale

    Authors: Charles Qi, Yi Wang, Hui Wang, Yang Lu, Shiva Shankar Subramanian, Finola Cahill, Conall Tuohy, Victor Li, Xu Qian, Darren Crews, Ling Wang, Shivaji Roy, Andrea Deidda, Martin Power, Niall Hanrahan, Rick Richmond, Umer Cheema, Arnab Raha, Alessandro Palla, Gary Baugh, Deepak Mathaikutty

    Abstract: State-of-art NPUs are typically architected as a self-contained sub-system with multiple heterogeneous hardware computing modules, and a dataflow-driven programming model. There lacks well-established methodology and tools in the industry to evaluate and compare the performance of NPUs from different architectures. We present an event-based performance modeling framework, VPU-EM, targeting scalabl… ▽ More

    Submitted 17 March, 2023; originally announced March 2023.

    Comments: 8 pages, 9 figures

    ACM Class: B.2.2; B.8.2

  33. arXiv:2303.03849  [pdf, other

    eess.AS cs.SD

    TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

    Authors: Christoph Boeddeker, Aswin Shanmugam Subramanian, Gordon Wichern, Reinhold Haeb-Umbach, Jonathan Le Roux

    Abstract: Since diarization and source separation of meeting data are closely related tasks, we here propose an approach to perform the two objectives jointly. It builds upon the target-speaker voice activity detection (TS-VAD) diarization approach, which assumes that initial speaker embeddings are available. We replace the final combined speaker activity estimation network of TS-VAD with a network that pro… ▽ More

    Submitted 1 January, 2024; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: Submitted to IEEE/ACM TASLP

  34. arXiv:2302.06812  [pdf, other

    cs.LG

    Scalable Optimal Multiway-Split Decision Trees with Constraints

    Authors: Shivaram Subramanian, Wei Sun

    Abstract: There has been a surge of interest in learning optimal decision trees using mixed-integer programs (MIP) in recent years, as heuristic-based methods do not guarantee optimality and find it challenging to incorporate constraints that are critical for many practical applications. However, existing MIP methods that build on an arc-based formulation do not scale well as the number of binary variables… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  35. arXiv:2302.01172  [pdf, other

    cs.LG

    STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition

    Authors: Yucheng Lu, Shivani Agrawal, Suvinay Subramanian, Oleg Rybakov, Christopher De Sa, Amir Yazdanbakhsh

    Abstract: Recent innovations on hardware (e.g. Nvidia A100) have motivated learning N:M structured sparsity masks from scratch for fast model inference. However, state-of-the-art learning recipes in this regime (e.g. SR-STE) are proposed for non-adaptive optimizers like momentum SGD, while incurring non-trivial accuracy drop for Adam-trained models like attention-based LLMs. In this paper, we first demonstr… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  36. arXiv:2301.12158  [pdf, other

    cs.AI

    A System for Human-AI collaboration for Online Customer Support

    Authors: Debayan Banerjee, Mathis Poser, Christina Wiethof, Varun Shankar Subramanian, Richard Paucar, Eva A. C. Bittner, Chris Biemann

    Abstract: AI enabled chat bots have recently been put to use to answer customer service queries, however it is a common feedback of users that bots lack a personal touch and are often unable to understand the real intent of the user's question. To this end, it is desirable to have human involvement in the customer servicing process. In this work, we present a system where a human support agent collaborates… ▽ More

    Submitted 7 February, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

  37. arXiv:2301.11936  [pdf, other

    quant-ph cs.LG stat.ML

    Quantum Ridgelet Transform: Winning Lottery Ticket of Neural Networks with Quantum Computation

    Authors: Hayata Yamasaki, Sathyawageeswar Subramanian, Satoshi Hayakawa, Sho Sonoda

    Abstract: A significant challenge in the field of quantum machine learning (QML) is to establish applications of quantum computation to accelerate common tasks in machine learning such as those for neural networks. Ridgelet transform has been a fundamental mathematical tool in the theoretical studies of neural networks, but the practical applicability of ridgelet transform to conducting learning tasks was l… ▽ More

    Submitted 11 September, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: 27 pages, 4 figures

    Journal ref: Proceedings of the 40th International Conference on Machine Learning (ICML2023) https://proceedings.mlr.press/v202/yamasaki23a.html

  38. arXiv:2301.11153  [pdf, other

    cs.LG cs.AI cs.MA

    Learning from Multiple Independent Advisors in Multi-agent Reinforcement Learning

    Authors: Sriram Ganapathi Subramanian, Matthew E. Taylor, Kate Larson, Mark Crowley

    Abstract: Multi-agent reinforcement learning typically suffers from the problem of sample inefficiency, where learning suitable policies involves the use of many data samples. Learning from external demonstrators is a possible solution that mitigates this problem. However, most prior approaches in this area assume the presence of a single demonstrator. Leveraging multiple knowledge sources (i.e., advisors)… ▽ More

    Submitted 2 March, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: Paper to appear in AAMAS 2023, London, UK

  39. arXiv:2301.10140  [pdf, other

    cs.DL cs.CL

    The Semantic Scholar Open Data Platform

    Authors: Rodney Kinney, Chloe Anastasiades, Russell Authur, Iz Beltagy, Jonathan Bragg, Alexandra Buraczynski, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Arman Cohan, Miles Crawford, Doug Downey, Jason Dunkelberger, Oren Etzioni, Rob Evans, Sergey Feldman, Joseph Gorney, David Graham, Fangzhou Hu, Regan Huff, Daniel King, Sebastian Kohlmeier, Bailey Kuehl, Michael Langan, Daniel Lin , et al. (23 additional authors not shown)

    Abstract: The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field. Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature. We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF conte… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: 8 pages, 6 figures

  40. arXiv:2212.07327  [pdf, other

    eess.AS cs.SD

    Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks

    Authors: Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Zhong-Qiu Wang, Jonathan Le Roux

    Abstract: Emulating the human ability to solve the cocktail party problem, i.e., focus on a source of interest in a complex acoustic scene, is a long standing goal of audio source separation research. Much of this research investigates separating speech from noise, speech from speech, musical instruments from each other, or sound events from each other. In this paper, we focus on the cocktail fork problem,… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: Submitted to IEEE TASLP (In review), 13 pages, 6 figures

  41. arXiv:2212.05933  [pdf, other

    q-fin.ST cs.LG

    Nostradamus: Weathering Worth

    Authors: Alapan Chaudhuri, Zeeshan Ahmed, Ashwin Rao, Shivansh Subramanian, Shreyas Pradhan, Abhishek Mittal

    Abstract: Nostradamus, inspired by the French astrologer and reputed seer, is a detailed study exploring relations between environmental factors and changes in the stock market. In this paper, we analyze associative correlation and causation between environmental elements (including natural disasters, climate and weather conditions) and stock prices, using historical stock market data, historical climate da… ▽ More

    Submitted 17 January, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: 13 pages, 13 figures; updated abstract; updated format to Springer LNCS

  42. arXiv:2212.03348  [pdf, ps, other

    quant-ph cs.CC cs.DS

    Quantum Worst-Case to Average-Case Reductions for All Linear Problems

    Authors: Vahid R. Asadi, Alexander Golovnev, Tom Gur, Igor Shinkar, Sathyawageeswar Subramanian

    Abstract: We study the problem of designing worst-case to average-case reductions for quantum algorithms. For all linear problems, we provide an explicit and efficient transformation of quantum algorithms that are only correct on a small (even sub-constant) fraction of their inputs into ones that are correct on all inputs. This stands in contrast to the classical setting, where such results are only known f… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  43. arXiv:2211.08303  [pdf, other

    eess.AS cs.AI cs.LG cs.SD stat.ML

    Reverberation as Supervision for Speech Separation

    Authors: Rohith Aralikatti, Christoph Boeddeker, Gordon Wichern, Aswin Shanmugam Subramanian, Jonathan Le Roux

    Abstract: This paper proposes reverberation as supervision (RAS), a novel unsupervised loss function for single-channel reverberant speech separation. Prior methods for unsupervised separation required the synthesis of mixtures of mixtures or assumed the existence of a teacher model, making them difficult to consider as potential methods explaining the emergence of separation abilities in an animal's audito… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: 5 pages, 2 figures, 4 tables. Submitted to ICASSP 2023

  44. arXiv:2210.12504  [pdf, other

    cs.LG cs.AI cs.CV physics.ao-ph

    Generative Modeling of High-resolution Global Precipitation Forecasts

    Authors: James Duncan, Shashank Subramanian, Peter Harrington

    Abstract: Forecasting global precipitation patterns and, in particular, extreme precipitation events is of critical importance to preparing for and adapting to climate change. Making accurate high-resolution precipitation forecasts using traditional physical models remains a major challenge in operational weather forecasting as they incur substantial computational costs and struggle to achieve sufficient fo… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022 Tackling Climate Change with Machine Learning Workshop

  45. arXiv:2209.07617  [pdf, other

    cs.LG cs.AI cs.AR cs.PF

    Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask

    Authors: Sheng-Chun Kao, Amir Yazdanbakhsh, Suvinay Subramanian, Shivani Agrawal, Utku Evci, Tushar Krishna

    Abstract: Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DNNs). Among different categories of sparsity, structured sparsity has gained more attention due to its efficient execution on modern accelerators. Particularly, N:M sparsity is attractive because there are already hardware accelerator architectures that can leverage certain forms of N:M structured sp… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: 11 pages, 2 figures, and 9 tables. Published at the ICML Workshop on Sparsity in Neural Networks Advancing Understanding and Practice, 2022. First two authors contributed equally

  46. arXiv:2208.05419  [pdf, ps, other

    physics.ao-ph cs.AI cs.CV cs.LG cs.PF

    FourCastNet: Accelerating Global High-Resolution Weather Forecasting using Adaptive Fourier Neural Operators

    Authors: Thorsten Kurth, Shashank Subramanian, Peter Harrington, Jaideep Pathak, Morteza Mardani, David Hall, Andrea Miele, Karthik Kashinath, Animashree Anandkumar

    Abstract: Extreme weather amplified by climate change is causing increasingly devastating impacts across the globe. The current use of physics-based numerical weather prediction (NWP) limits accuracy due to high computational cost and strict time-to-solution limits. We report that a data-driven deep learning Earth system emulator, FourCastNet, can predict global weather and generate medium-range forecasts f… ▽ More

    Submitted 8 August, 2022; originally announced August 2022.

  47. arXiv:2207.10163  [pdf, other

    math.OC cs.LG

    Constrained Prescriptive Trees via Column Generation

    Authors: Shivaram Subramanian, Wei Sun, Youssef Drissi, Markus Ettl

    Abstract: With the abundance of available data, many enterprises seek to implement data-driven prescriptive analytics to help them make informed decisions. These prescriptive policies need to satisfy operational constraints, and proactively eliminate rule conflicts, both of which are ubiquitous in practice. It is also desirable for them to be simple and interpretable, so they can be easily verified and impl… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

  48. arXiv:2207.04084  [pdf, other

    cs.LG physics.comp-ph

    Adaptive Self-supervision Algorithms for Physics-informed Neural Networks

    Authors: Shashank Subramanian, Robert M. Kirby, Michael W. Mahoney, Amir Gholami

    Abstract: Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function, but recent work has shown that this can lead to optimization difficulties. Here, we study the impact of the location of the collocation points on the trainability of these models. We find that the vanilla PINN performance can be significantly boosted by adaptin… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

    Comments: 15 pages

  49. Software Engineering Process and Methodology in Blockchain-Oriented Software Development: A Systematic Study

    Authors: Md Jobair Hossain Faruk, Santhiya Subramanian, Hossain Shahriar, Maria Valero, Xia Li, Masrura Tasnim

    Abstract: Software Engineering is the process of a systematic, disciplined, quantifiable approach that has significant impact on large-scale and complex software development. Scores of well-established software process models have long been adopted in the software development life cycle that pour stakeholders towards the completion of final software product development. Within the boundary of advanced techn… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

    Journal ref: 2022 IEEE/ACIS 20th International Conference on Software Engineering Research, Management and Applications (SERA)

  50. arXiv:2206.01137  [pdf, other

    cs.CL cs.LG

    Finding the Right Recipe for Low Resource Domain Adaptation in Neural Machine Translation

    Authors: Virginia Adams, Sandeep Subramanian, Mike Chrzanowski, Oleksii Hrinchuk, Oleksii Kuchaiev

    Abstract: General translation models often still struggle to generate accurate translations in specialized domains. To guide machine translation practitioners and characterize the effectiveness of domain adaptation methods under different data availability scenarios, we conduct an in-depth empirical exploration of monolingual and parallel data approaches to domain adaptation of pre-trained, third-party, NMT… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.