Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 77 results for author: Panda, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12034  [pdf, other

    cs.CL cs.LG

    Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

    Authors: Junmo Kang, Leonid Karlinsky, Hongyin Luo, Zhen Wang, Jacob Hansen, James Glass, David Cox, Rameswar Panda, Rogerio Feris, Alan Ritter

    Abstract: We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data, each equipped with a shared base LLM and incorporating self-optimized routing. This allows for dynamic a… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2405.12981  [pdf, other

    cs.LG cs.CL

    Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

    Authors: William Brandon, Mayank Mishra, Aniruddha Nrusimha, Rameswar Panda, Jonathan Ragan Kelly

    Abstract: Key-value (KV) caching plays an essential role in accelerating decoding for transformer-based autoregressive large language models (LLMs). However, the amount of memory required to store the KV cache can become prohibitive at long sequence lengths and large batch sizes. Since the invention of the transformer, two of the most effective interventions discovered for reducing the size of the KV cache… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  3. arXiv:2405.04324  [pdf, other

    cs.AI cs.CL cs.SE

    Granite Code Models: A Family of Open Foundation Models for Code Intelligence

    Authors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal , et al. (21 additional authors not shown)

    Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang

  4. arXiv:2404.05567  [pdf, other

    cs.LG cs.AI cs.CL

    Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

    Authors: Bowen Pan, Yikang Shen, Haokun Liu, Mayank Mishra, Gaoyuan Zhang, Aude Oliva, Colin Raffel, Rameswar Panda

    Abstract: Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times$ compared to dense models without sacrificing performance, making them more efficient in computation-bounded scenarios. However, MoE models generally require 2-4$\times$ times more parameters to achieve comparable performance to a dense model, which incurs larger GPU memory requirements and makes MoE models less… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  5. arXiv:2404.03605  [pdf, other

    cs.LG cs.CL

    Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

    Authors: Aniruddha Nrusimha, Mayank Mishra, Naigang Wang, Dan Alistarh, Rameswar Panda, Yoon Kim

    Abstract: We consider the problem of accurate quantization for language models, where both the weights and activations are uniformly quantized to 4 bits per parameter, the lowest bitwidth format natively supported by GPU hardware. In this context, the key challenge is activation quantization: it is known that language models contain outlier channels whose values on average are orders of magnitude higher tha… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  6. arXiv:2403.08245  [pdf, other

    cs.LG cs.DC

    Scattered Mixture-of-Experts Implementation

    Authors: Shawn Tan, Yikang Shen, Rameswar Panda, Aaron Courville

    Abstract: We present ScatterMoE, an implementation of Sparse Mixture-of-Experts (SMoE) on GPUs. ScatterMoE builds upon existing implementations, and overcoming some of the limitations to improve inference and training speed, and memory footprint. This implementation achieves this by avoiding padding and making excessive copies of the input. We introduce ParallelLinear, the main component we use to build our… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  7. arXiv:2402.10171  [pdf, other

    cs.CL cs.AI

    Data Engineering for Scaling Language Models to 128K Context

    Authors: Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, Hao Peng

    Abstract: We study the continual pretraining recipe for scaling language models' context lengths to 128K, with a focus on data engineering. We hypothesize that long context modeling, in particular \textit{the ability to utilize information at arbitrary input locations}, is a capability that is mostly already acquired through large-scale pretraining, and that this capability can be readily extended to contex… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Code at https://github.com/FranxYao/Long-Context-Data-Engineering

  8. arXiv:2402.09615  [pdf, other

    cs.CL cs.AI cs.LG

    API Pack: A Massive Multi-Programming Language Dataset for API Call Generation

    Authors: Zhen Guo, Adriana Meza Soria, Wei Sun, Yikang Shen, Rameswar Panda

    Abstract: We introduce API Pack, a massive multi-programming language dataset containing more than 1 million instruction-API call pairs to improve the API call generation capabilities of large language models. By fine-tuning CodeLlama-13B on 20,000 Python instances from API Pack, we enable it to outperform GPT-3.5 and GPT-4 in generating unseen API calls. Fine-tuning on API Pack also facilitates cross-progr… ▽ More

    Submitted 3 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  9. arXiv:2402.02318  [pdf, other

    cs.LG cs.CL

    Diversity Measurement and Subset Selection for Instruction Tuning Datasets

    Authors: Peiqi Wang, Yikang Shen, Zhen Guo, Matthew Stallone, Yoon Kim, Polina Golland, Rameswar Panda

    Abstract: We aim to select data subsets for the fine-tuning of large language models to more effectively follow instructions. Prior work has emphasized the importance of diversity in dataset curation but relied on heuristics such as the number of tasks. In this paper, we use determinantal point processes to capture the diversity and quality of instruction tuning datasets for subset selection. We propose to… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  10. arXiv:2312.06635  [pdf, other

    cs.LG cs.CL

    Gated Linear Attention Transformers with Hardware-Efficient Training

    Authors: Songlin Yang, Bailin Wang, Yikang Shen, Rameswar Panda, Yoon Kim

    Abstract: Transformers with linear attention allow for efficient parallel training but can simultaneously be formulated as an RNN with 2D (matrix-valued) hidden states, thus enjoying linear-time inference complexity. However, linear attention generally underperforms ordinary softmax attention. Moreover, current implementations of linear attention lack I/O-awareness and are thus slower than highly optimized… ▽ More

    Submitted 5 June, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: ICML cameray ready

  11. arXiv:2311.06231  [pdf, other

    cs.CV

    Learning Human Action Recognition Representations Without Real Humans

    Authors: Howard Zhong, Samarth Mishra, Donghyun Kim, SouYoung Jin, Rameswar Panda, Hilde Kuehne, Leonid Karlinsky, Venkatesh Saligrama, Aude Oliva, Rogerio Feris

    Abstract: Pre-training on massive video datasets has become essential to achieve high action recognition performance on smaller downstream datasets. However, most large-scale video datasets contain images of people and hence are accompanied with issues related to privacy, ethics, and data protection, often preventing them from being publicly shared for reproducible research. Existing work has attempted to a… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: 19 pages, 7 figures, 2023 NeurIPS Datasets and Benchmarks Track

  12. arXiv:2310.07889  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    LangNav: Language as a Perceptual Representation for Navigation

    Authors: Bowen Pan, Rameswar Panda, SouYoung Jin, Rogerio Feris, Aude Oliva, Phillip Isola, Yoon Kim

    Abstract: We explore the use of language as a perceptual representation for vision-and-language navigation (VLN), with a focus on low-data settings. Our approach uses off-the-shelf vision systems for image captioning and object detection to convert an agent's egocentric panoramic view at each time step into natural language descriptions. We then finetune a pretrained language model to select an action, base… ▽ More

    Submitted 30 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  13. arXiv:2308.13604  [pdf, other

    cond-mat.dis-nn cond-mat.stat-mech cs.SI physics.data-an physics.soc-ph

    Network science Ising states of matter

    Authors: Hanlin Sun, Rajat Kumar Panda, Roberto Verdel, Alex Rodriguez, Marcello Dalmonte, Ginestra Bianconi

    Abstract: Network science provides very powerful tools for extracting information from interacting data. Although recently the unsupervised detection of phases of matter using machine learning has raised significant interest, the full prediction power of network science has not yet been systematically explored in this context. Here we fill this gap by providing an in-depth statistical, combinatorial, geomet… ▽ More

    Submitted 9 May, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

    Comments: 17 pages, 18 figures

  14. arXiv:2305.19595  [pdf, other

    cs.CV

    Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models

    Authors: Sivan Doveh, Assaf Arbelle, Sivan Harary, Roei Herzig, Donghyun Kim, Paola Cascante-bonilla, Amit Alfassy, Rameswar Panda, Raja Giryes, Rogerio Feris, Shimon Ullman, Leonid Karlinsky

    Abstract: Vision and Language (VL) models offer an effective method for aligning representation spaces of images and text, leading to numerous applications such as cross-modal retrieval, visual question answering, captioning, and more. However, the aligned image-text spaces learned by all the popular VL models are still suffering from the so-called `object bias' - their representations behave as `bags of no… ▽ More

    Submitted 1 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

  15. arXiv:2303.17590  [pdf, other

    cs.CV cs.CL

    Going Beyond Nouns With Vision & Language Models Using Synthetic Data

    Authors: Paola Cascante-Bonilla, Khaled Shehada, James Seale Smith, Sivan Doveh, Donghyun Kim, Rameswar Panda, Gül Varol, Aude Oliva, Vicente Ordonez, Rogerio Feris, Leonid Karlinsky

    Abstract: Large-scale pre-trained Vision & Language (VL) models have shown remarkable performance in many applications, enabling replacing a fixed set of supported classes with zero-shot open vocabulary reasoning over (almost arbitrary) natural language prompts. However, recent works have uncovered a fundamental weakness of these models. For example, their difficulty to understand Visual Language Concepts (… ▽ More

    Submitted 30 August, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: Accepted to ICCV 2023. Project page: https://synthetic-vic.github.io/

  16. arXiv:2303.09639  [pdf, other

    cs.CL

    Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models

    Authors: Aashka Trivedi, Takuma Udagawa, Michele Merler, Rameswar Panda, Yousef El-Kurdi, Bishwaranjan Bhattacharjee

    Abstract: Large pretrained language models have achieved state-of-the-art results on a variety of downstream tasks. Knowledge Distillation (KD) into a smaller student model addresses their inefficiency, allowing for deployment in resource-constrained environments. However, KD can be ineffective when the student is manually selected from a set of existing options, since it can be a sub-optimal choice within… ▽ More

    Submitted 13 October, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: 11 pages, 5 figures

  17. arXiv:2303.08914  [pdf, other

    cs.CV

    MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

    Authors: Wei Lin, Leonid Karlinsky, Nina Shvetsova, Horst Possegger, Mateusz Kozinski, Rameswar Panda, Rogerio Feris, Hilde Kuehne, Horst Bischof

    Abstract: Large scale Vision-Language (VL) models have shown tremendous success in aligning representations between visual and text modalities. This enables remarkable progress in zero-shot recognition, image generation & editing, and many other exciting tasks. However, VL models tend to over-represent objects while paying much less attention to verbs, and require additional tuning on video data for best ze… ▽ More

    Submitted 22 July, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted at ICCV 2023

  18. arXiv:2303.02861  [pdf, other

    cs.CL

    Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

    Authors: Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, Yoon Kim

    Abstract: Prompt tuning, in which a base pretrained model is adapted to each task via conditioning on learned prompt vectors, has emerged as a promising approach for efficiently adapting large language models to multiple downstream tasks. However, existing methods typically learn soft prompt vectors from scratch, and it has not been clear how to exploit the rich cross-task knowledge with prompt vectors in a… ▽ More

    Submitted 5 March, 2023; originally announced March 2023.

    Comments: ICLR 2023. Project page: https://zhenwang9102.github.io/mpt.html

  19. arXiv:2303.00980  [pdf, other

    cs.LG

    Learning to Grow Pretrained Models for Efficient Transformer Training

    Authors: Peihao Wang, Rameswar Panda, Lucas Torroba Hennigen, Philip Greengard, Leonid Karlinsky, Rogerio Feris, David Daniel Cox, Zhangyang Wang, Yoon Kim

    Abstract: Scaling transformers has led to significant breakthroughs in many domains, leading to a paradigm in which larger versions of existing models are trained and released on a periodic basis. New instances of such models are typically trained completely from scratch, despite the fact that they are often just scaled-up versions of their smaller counterparts. How can we use the implicit knowledge in the… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: International Conference on Learning Representations (ICLR), 2023

  20. arXiv:2302.07253  [pdf, other

    cs.LG cond-mat.dis-nn cs.CV q-bio.NC stat.ML

    Energy Transformer

    Authors: Benjamin Hoover, Yuchen Liang, Bao Pham, Rameswar Panda, Hendrik Strobelt, Duen Horng Chau, Mohammed J. Zaki, Dmitry Krotov

    Abstract: Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. Attention is the power-house driving modern deep learning successes, but it lacks clear theoretical foundations. Energy-based models allow a principled approach to discriminative and generative tasks, but the design of the energy functional is not st… ▽ More

    Submitted 31 October, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Journal ref: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  21. arXiv:2302.04249  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Federated Minimax Optimization with Client Heterogeneity

    Authors: Pranay Sharma, Rohan Panda, Gauri Joshi

    Abstract: Minimax optimization has seen a surge in interest with the advent of modern applications such as GANs, and it is inherently more challenging than simple minimization. The difficulty is exacerbated by the training data residing at multiple edge devices or \textit{clients}, especially when these clients can have heterogeneous datasets and local computation capabilities. We propose a general federate… ▽ More

    Submitted 9 February, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: 52 pages, 8 figures

  22. arXiv:2212.09864  [pdf, other

    cs.CL cs.AI

    Synthetic Pre-Training Tasks for Neural Machine Translation

    Authors: Zexue He, Graeme Blackwood, Rameswar Panda, Julian McAuley, Rogerio Feris

    Abstract: Pre-training models with large crawled corpora can lead to issues such as toxicity and bias, as well as copyright and privacy concerns. A promising way of alleviating such concerns is to conduct pre-training with synthetic tasks and data, since no real-world information is ingested by the model. Our goal in this paper is to understand the factors that contribute to the effectiveness of pre-trainin… ▽ More

    Submitted 30 May, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Accepted to ACL2023-Findings. New added Phrase-cat for synthetic pre-training. 17 pages including 5-page appendix

  23. arXiv:2211.13218  [pdf, other

    cs.CV cs.AI cs.LG

    CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning

    Authors: James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, Zsolt Kira

    Abstract: Computer vision models suffer from a phenomenon known as catastrophic forgetting when learning novel concepts from continuously shifting training data. Typical solutions for this continual learning problem require extensive rehearsal of previously seen data, which increases memory costs and may violate data privacy. Recently, the emergence of large-scale pre-trained vision transformer models has e… ▽ More

    Submitted 30 March, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted by the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023)

  24. arXiv:2211.11733  [pdf, other

    cs.CV

    Teaching Structured Vision&Language Concepts to Vision&Language Models

    Authors: Sivan Doveh, Assaf Arbelle, Sivan Harary, Rameswar Panda, Roei Herzig, Eli Schwartz, Donghyun Kim, Raja Giryes, Rogerio Feris, Shimon Ullman, Leonid Karlinsky

    Abstract: Vision and Language (VL) models have demonstrated remarkable zero-shot performance in a variety of tasks. However, some aspects of complex language understanding still remain a challenge. We introduce the collective notion of Structured Vision&Language Concepts (SVLC) which includes object attributes, relations, and states which are present in the text and visible in the image. Recent studies have… ▽ More

    Submitted 30 May, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Journal ref: CVPR 2023

  25. arXiv:2211.09790  [pdf, other

    cs.LG cs.AI cs.CV

    ConStruct-VL: Data-Free Continual Structured VL Concepts Learning

    Authors: James Seale Smith, Paola Cascante-Bonilla, Assaf Arbelle, Donghyun Kim, Rameswar Panda, David Cox, Diyi Yang, Zsolt Kira, Rogerio Feris, Leonid Karlinsky

    Abstract: Recently, large-scale pre-trained Vision-and-Language (VL) foundation models have demonstrated remarkable capabilities in many zero-shot downstream tasks, achieving competitive results for recognizing objects defined by as little as short text prompts. However, it has also been shown that VL models are still brittle in Structured VL Concept (SVLC) reasoning, such as the ability to recognize object… ▽ More

    Submitted 30 March, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: Accepted by the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023)

  26. arXiv:2210.09486  [pdf, other

    cs.CV

    Semi-Supervised Domain Adaptation with Auto-Encoder via Simultaneous Learning

    Authors: Md Mahmudur Rahman, Rameswar Panda, Mohammad Arif Ul Alam

    Abstract: We present a new semi-supervised domain adaptation framework that combines a novel auto-encoder-based domain adaptation model with a simultaneous learning scheme providing stable improvements over state-of-the-art domain adaptation models. Our framework holds strong distribution matching property by training both source and target auto-encoders using a novel simultaneous learning scheme on a singl… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

  27. arXiv:2209.03648  [pdf, other

    cs.CV

    FETA: Towards Specializing Foundation Models for Expert Task Applications

    Authors: Amit Alfassy, Assaf Arbelle, Oshri Halimi, Sivan Harary, Roei Herzig, Eli Schwartz, Rameswar Panda, Michele Dolfi, Christoph Auer, Kate Saenko, PeterW. J. Staar, Rogerio Feris, Leonid Karlinsky

    Abstract: Foundation Models (FMs) have demonstrated unprecedented capabilities including zero-shot learning, high fidelity data synthesis, and out of domain generalization. However, as we show in this paper, FMs still have poor out-of-the-box performance on expert tasks (e.g. retrieval of car manuals technical illustrations from language queries), data for which is either unseen or belonging to a long-tail… ▽ More

    Submitted 19 December, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

  28. arXiv:2206.00100  [pdf, other

    cs.CV cs.CL

    VALHALLA: Visual Hallucination for Machine Translation

    Authors: Yi Li, Rameswar Panda, Yoon Kim, Chun-Fu Chen, Rogerio Feris, David Cox, Nuno Vasconcelos

    Abstract: Designing better machine translation systems by considering auxiliary inputs such as images has attracted much attention in recent years. While existing methods show promising performance over the conventional text-only translation systems, they typically require paired text and image as input during inference, which limits their applicability to real-world scenarios. In this paper, we introduce a… ▽ More

    Submitted 31 May, 2022; originally announced June 2022.

    Comments: CVPR 2022

  29. arXiv:2203.04850  [pdf, other

    math.OC cs.DC cs.LG

    Federated Minimax Optimization: Improved Convergence Analyses and Algorithms

    Authors: Pranay Sharma, Rohan Panda, Gauri Joshi, Pramod K. Varshney

    Abstract: In this paper, we consider nonconvex minimax optimization, which is gaining prominence in many modern machine learning applications such as GANs. Large-scale edge-based collection of training data in these applications calls for communication-efficient distributed optimization algorithms, such as those used in federated learning, to process the data. In this paper, we analyze Local stochastic grad… ▽ More

    Submitted 9 March, 2022; originally announced March 2022.

    Comments: 52 pages, 4 figures

  30. arXiv:2112.00054  [pdf, other

    cs.CV cs.LG

    Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data

    Authors: Samarth Mishra, Rameswar Panda, Cheng Perng Phoo, Chun-Fu Chen, Leonid Karlinsky, Kate Saenko, Venkatesh Saligrama, Rogerio S. Feris

    Abstract: Pre-training models on Imagenet or other massive datasets of real images has led to major advances in computer vision, albeit accompanied with shortcomings related to curation cost, privacy, usage rights, and ethical issues. In this paper, for the first time, we study the transferability of pre-trained models based on synthetic data generated by graphics simulators to downstream tasks from very di… ▽ More

    Submitted 28 March, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

    Comments: Accepted to CVPR'22

  31. arXiv:2111.04823  [pdf, other

    cs.CL cs.CV cs.MM cs.SD eess.AS eess.IV

    Cascaded Multilingual Audio-Visual Learning from Videos

    Authors: Andrew Rouditchenko, Angie Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, James Glass

    Abstract: In this paper, we explore self-supervised audio-visual models that learn from instructional videos. Prior work has shown that these models can relate spoken words and sounds to visual content after training on a large-scale dataset of videos, but they were only trained and evaluated on videos in English. To learn multilingual audio-visual representations, we propose a cascaded approach that levera… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: Presented at Interspeech 2021. This version contains updated results using the YouCook-Japanese dataset

  32. arXiv:2110.15403  [pdf, other

    cs.LG stat.ML

    Selective Regression Under Fairness Criteria

    Authors: Abhin Shah, Yuheng Bu, Joshua Ka-Wing Lee, Subhro Das, Rameswar Panda, Prasanna Sattigeri, Gregory W. Wornell

    Abstract: Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we redu… ▽ More

    Submitted 14 July, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

  33. arXiv:2110.15128  [pdf, other

    cs.CV

    Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

    Authors: Aadarsh Sahoo, Rutav Shah, Rameswar Panda, Kate Saenko, Abir Das

    Abstract: Unsupervised domain adaptation which aims to adapt models trained on a labeled source domain to a completely unlabeled target domain has attracted much attention in recent years. While many domain adaptation techniques have been proposed for images, the problem of unsupervised domain adaptation in videos remains largely underexplored. In this paper, we introduce Contrast and Mix (CoMix), a new con… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2021. Project page: https://cvir.github.io/projects/comix

  34. arXiv:2109.12405  [pdf, other

    cs.AR

    CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5D, and 3D Processor-Memory Systems

    Authors: Lokesh Siddhu, Rajesh Kedia, Shailja Pandey, Martin Rapp, Anuj Pathania, Jörg Henkel, Preeti Ranjan Panda

    Abstract: Processing cores and the accompanying main memory working in tandem enable the modern processors. Dissipating heat produced from computation, memory access remains a significant problem for processors. Therefore, processor thermal management continues to be an active research topic. Most thermal management research takes place using simulations, given the challenges of measuring temperature in rea… ▽ More

    Submitted 16 March, 2022; v1 submitted 25 September, 2021; originally announced September 2021.

    Comments: https://github.com/marg-tools/CoMeT

  35. arXiv:2108.10394  [pdf, other

    cs.CV

    Dynamic Network Quantization for Efficient Video Inference

    Authors: Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Aude Oliva, Rogerio Feris, Kate Saenko

    Abstract: Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition. Motivated by the effectiveness of quantization for boosting efficiency, in this paper, we propose a dynamic network quantization framework, that selects optimal precision… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

    Comments: ICCV 2021 Camera Ready Version

  36. arXiv:2106.14104  [pdf, ps, other

    cs.CV

    Can An Image Classifier Suffice For Action Recognition?

    Authors: Quanfu Fan, Chun-Fu, Chen, Rameswar Panda

    Abstract: We explore a new perspective on video understanding by casting the video recognition problem as an image recognition task. Our approach rearranges input video frames into super images, which allow for training an image classifier directly to fulfill the task of action recognition, in exactly the same way as image classification. With such a simple idea, we show that transformer-based image classif… ▽ More

    Submitted 25 April, 2022; v1 submitted 26 June, 2021; originally announced June 2021.

  37. arXiv:2106.12620  [pdf, other

    cs.CV

    IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers

    Authors: Bowen Pan, Rameswar Panda, Yifan Jiang, Zhangyang Wang, Rogerio Feris, Aude Oliva

    Abstract: The self-attention-based model, transformer, is recently becoming the leading backbone in the field of computer vision. In spite of the impressive success made by transformers in a variety of vision tasks, it still suffers from heavy computation and intensive memory costs. To address this limitation, this paper presents an Interpretability-Aware REDundancy REDuction framework (IA-RED$^2$). We star… ▽ More

    Submitted 26 October, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: Accepted in NeurIPS 2021

  38. arXiv:2106.07807  [pdf, other

    cs.CV

    Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data

    Authors: Ashraful Islam, Chun-Fu Chen, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Richard J. Radke

    Abstract: Most existing works in few-shot learning rely on meta-learning the network on a large base dataset which is typically from the same domain as the target dataset. We tackle the problem of cross-domain few-shot learning where there is a large shift between the base and target domain. The problem of cross-domain few-shot recognition with unlabeled target data is largely unaddressed in the literature.… ▽ More

    Submitted 1 November, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: Accepted to NeurIPS 2021

  39. arXiv:2106.02689  [pdf, ps, other

    cs.CV

    RegionViT: Regional-to-Local Attention for Vision Transformers

    Authors: Chun-Fu Chen, Rameswar Panda, Quanfu Fan

    Abstract: Vision transformer (ViT) has recently shown its strong capability in achieving comparable results to convolutional neural networks (CNNs) on image classification. However, vanilla ViT simply inherits the same architecture from the natural language processing directly, which is often not optimized for vision applications. Motivated by this, in this paper, we propose a new architecture that adopts t… ▽ More

    Submitted 30 March, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: add more results and link to codes and models. https://github.com/ibm/regionvit, formatted with ICLR style

  40. arXiv:2105.05165  [pdf, other

    cs.CV cs.AI cs.LG

    AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

    Authors: Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko, Aude Oliva, Rogerio Feris

    Abstract: Multi-modal learning, which focuses on utilizing various modalities to improve the performance of a model, is widely used in video recognition. While traditional multi-modal learning offers excellent recognition results, its computational expense limits its impact for many real-world applications. In this paper, we propose an adaptive multi-modal learning framework, called AdaMML, that selects on-… ▽ More

    Submitted 12 May, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

  41. arXiv:2104.12671  [pdf, other

    cs.CV

    Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

    Authors: Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang

    Abstract: Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities. In this context, this paper proposes a self-supervised training framework that learns a common multimodal embedding space that, in addition to sharing representations across different modalitie… ▽ More

    Submitted 3 September, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

    Comments: To be presented at ICCV 2021

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 8012-8021

  42. arXiv:2104.09829  [pdf, other

    cs.CV

    Detector-Free Weakly Supervised Grounding by Separation

    Authors: Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky

    Abstract: Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for WSG assume the existence of a pre-trained object de… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

  43. arXiv:2103.14899  [pdf, ps, other

    cs.CV

    CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

    Authors: Chun-Fu Chen, Quanfu Fan, Rameswar Panda

    Abstract: The recently developed vision transformer (ViT) has achieved promising results on image classification compared to convolutional neural networks. Inspired by this, in this paper, we study how to learn multi-scale feature representations in transformer models for image classification. To this end, we propose a dual-branch transformer to combine image patches (i.e., tokens in a transformer) of diffe… ▽ More

    Submitted 22 August, 2021; v1 submitted 27 March, 2021; originally announced March 2021.

    Comments: Accepted by ICCV 2021

  44. arXiv:2103.13517  [pdf, other

    cs.CV

    A Broad Study on the Transferability of Visual Representations with Contrastive Learning

    Authors: Ashraful Islam, Chun-Fu Chen, Rameswar Panda, Leonid Karlinsky, Richard Radke, Rogerio Feris

    Abstract: Tremendous progress has been made in visual representation learning, notably with the recent success of self-supervised contrastive learning methods. Supervised contrastive learning has also been shown to outperform its cross-entropy counterparts by leveraging labels for choosing where to contrast. However, there has been little work to explore the transfer capability of contrastive learning to a… ▽ More

    Submitted 15 August, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: accepted to ICCV 2021

  45. arXiv:2103.01435  [pdf, other

    cs.CV

    Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

    Authors: Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan, Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko

    Abstract: Quantizing deep networks with adaptive bit-widths is a promising technique for efficient inference across many devices and resource constraints. In contrast to static methods that repeat the quantization process and train different models for different constraints, adaptive quantization enables us to flexibly adjust the bit-widths of a single deep network during inference for instant adaptation in… ▽ More

    Submitted 16 September, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

  46. arXiv:2102.07887  [pdf, other

    cs.CV

    VA-RED$^2$: Video Adaptive Redundancy Reduction

    Authors: Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

    Abstract: Performing inference on deep learning models for videos remains a challenge due to the large amount of computational resources required to achieve robust recognition. An inherent property of real-world videos is the high correlation of information across frames which can translate into redundancy in either temporal or spatial feature maps of the models, or both. The type of redundant features depe… ▽ More

    Submitted 4 October, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: Accepted in ICLR 2021

  47. arXiv:2102.05775  [pdf, other

    cs.CV

    AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

    Authors: Yue Meng, Rameswar Panda, Chung-Ching Lin, Prasanna Sattigeri, Leonid Karlinsky, Kate Saenko, Aude Oliva, Rogerio Feris

    Abstract: Temporal modelling is the key for efficient video action recognition. While understanding temporal information can improve recognition accuracy for dynamic actions, removing temporal redundancy and reusing past features can significantly save computation leading to efficient action recognition. In this paper, we introduce an adaptive temporal fusion network, called AdaFuse, that dynamically fuses… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

    Comments: Accepted to ICLR2021

  48. arXiv:2102.02751  [pdf, other

    cs.CV

    Semi-Supervised Action Recognition with Temporal Contrastive Learning

    Authors: Ankit Singh, Omprakash Chakraborty, Ashutosh Varshney, Rameswar Panda, Rogerio Feris, Kate Saenko, Abir Das

    Abstract: Learning to recognize actions from only a handful of labeled videos is a challenging problem due to the scarcity of tediously collected activity labels. We approach this problem by learning a two-pathway temporal contrastive model using unlabeled videos at two different speeds leveraging the fact that changing video speed does not change an action. Specifically, we propose to maximize the similari… ▽ More

    Submitted 29 March, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: Accepted in CVPR 2021

  49. arXiv:2012.15259  [pdf, other

    cs.LG cs.AI cs.IT stat.ML

    A Maximal Correlation Approach to Imposing Fairness in Machine Learning

    Authors: Joshua Lee, Yuheng Bu, Prasanna Sattigeri, Rameswar Panda, Gregory Wornell, Leonid Karlinsky, Rogerio Feris

    Abstract: As machine learning algorithms grow in popularity and diversify to many industries, ethical and legal concerns regarding their fairness have become increasingly relevant. We explore the problem of algorithmic fairness, taking an information-theoretic view. The maximal correlation framework is introduced for expressing fairness constraints and shown to be capable of being used to derive regularizer… ▽ More

    Submitted 30 December, 2020; originally announced December 2020.

    Comments: 9 Pages 4 Figures

  50. arXiv:2012.03358  [pdf, other

    cs.CV

    Select, Label, and Mix: Learning Discriminative Invariant Feature Representations for Partial Domain Adaptation

    Authors: Aadarsh Sahoo, Rameswar Panda, Rogerio Feris, Kate Saenko, Abir Das

    Abstract: Partial domain adaptation which assumes that the unknown target label space is a subset of the source label space has attracted much attention in computer vision. Despite recent progress, existing methods often suffer from three key problems: negative transfer, lack of discriminability, and domain invariance in the latent space. To alleviate the above issues, we develop a novel 'Select, Label, and… ▽ More

    Submitted 3 January, 2023; v1 submitted 6 December, 2020; originally announced December 2020.

    Comments: Accepted to WACV 2023. Project page: https://cvir.github.io/projects/slm.html