Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 241 results for author: Garg, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20192  [pdf, other

    cs.LG eess.SY

    Time series forecasting with high stakes: A field study of the air cargo industry

    Authors: Abhinav Garg, Naman Shukla

    Abstract: Time series forecasting in the air cargo industry presents unique challenges due to volatile market dynamics and the significant impact of accurate forecasts on generated revenue. This paper explores a comprehensive approach to demand forecasting at the origin-destination (O\&D) level, focusing on the development and implementation of machine learning models in decision-making for the air cargo in… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: The 10th Mining and Learning from Time Series Workshop: From Classical Methods to LLMs. SIGKDD, Barcelona, Spain, 6 page

  2. arXiv:2407.16503  [pdf, other

    cs.CV eess.IV

    HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images

    Authors: Shreyas Singh, Aryan Garg, Kaushik Mitra

    Abstract: The recent advent of 3D Gaussian Splatting (3DGS) has revolutionized the 3D scene reconstruction space enabling high-fidelity novel view synthesis in real-time. However, with the exception of RawNeRF, all prior 3DGS and NeRF-based methods rely on 8-bit tone-mapped Low Dynamic Range (LDR) images for scene reconstruction. Such methods struggle to achieve accurate reconstructions in scenes that requi… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  3. arXiv:2407.15840  [pdf, other

    cs.RO

    QueST: Self-Supervised Skill Abstractions for Learning Continuous Control

    Authors: Atharva Mete, Haotian Xue, Albert Wilcox, Yongxin Chen, Animesh Garg

    Abstract: Generalization capabilities, or rather a lack thereof, is one of the most important unsolved problems in the field of robot learning, and while several large scale efforts have set out to tackle this problem, unsolved it remains. In this paper, we hypothesize that learning temporal action abstractions using latent variable models (LVMs), which learn to map data to a compressed latent space and bac… ▽ More

    Submitted 22 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: Keywords: Behavior Clonning, Action Quantization, Self Supervised Skill Abstraction, Few-shot Imitation Learning

  4. arXiv:2407.13833  [pdf, other

    cs.CL cs.AI

    Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle

    Authors: Emman Haider, Daniel Perez-Becker, Thomas Portet, Piyush Madan, Amit Garg, David Majercak, Wen Wen, Dongwoo Kim, Ziyi Yang, Jianwen Zhang, Hiteshi Sharma, Blake Bullwinkel, Martin Pouliot, Amanda Minnich, Shiven Chawla, Solianna Herrera, Shahed Warreth, Maggie Engler, Gary Lopez, Nina Chikanov, Raja Sekhar Rao Dheekonda, Bolor-Erdene Jagdagdorj, Roman Lutz, Richard Lundeen, Tori Westerhoff , et al. (5 additional authors not shown)

    Abstract: Recent innovations in language model training have demonstrated that it is possible to create highly performant models that are small enough to run on a smartphone. As these models are deployed in an increasing number of domains, it is critical to ensure that they are aligned with human preferences and safety considerations. In this report, we present our methodology for safety aligning the Phi-3… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  5. arXiv:2407.02466  [pdf, other

    cs.LG cs.AI cs.RO

    PWM: Policy Learning with Large World Models

    Authors: Ignat Georgiev, Varun Giridhar, Nicklas Hansen, Animesh Garg

    Abstract: Reinforcement Learning (RL) has achieved impressive results on complex tasks but struggles in multi-task settings with different embodiments. World models offer scalability by learning a simulation of the environment, yet they often rely on inefficient gradient-free optimization methods. We introduce Policy learning with large World Models (PWM), a novel model-based RL algorithm that learns contin… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Visualizations and code available at https://www.imgeorgiev.com/pwm

  6. arXiv:2405.17784  [pdf, other

    cs.LG cs.AI

    Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation

    Authors: Ignat Georgiev, Krishnan Srinivasan, Jie Xu, Eric Heiden, Animesh Garg

    Abstract: Model-Free Reinforcement Learning (MFRL), leveraging the policy gradient theorem, has demonstrated considerable success in continuous control tasks. However, these approaches are plagued by high gradient variance due to zeroth-order gradient estimation, resulting in suboptimal policies. Conversely, First-Order Model-Based Reinforcement Learning (FO-MBRL) methods employing differentiable simulation… ▽ More

    Submitted 3 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Website https://adaptive-horizon-actor-critic.github.io/

  7. arXiv:2405.11823  [pdf, other

    cs.CV

    Stereo-Knowledge Distillation from dpMV to Dual Pixels for Light Field Video Reconstruction

    Authors: Aryan Garg, Raghav Mallampali, Akshat Joshi, Shrisudhan Govindarajan, Kaushik Mitra

    Abstract: Dual pixels contain disparity cues arising from the defocus blur. This disparity information is useful for many vision tasks ranging from autonomous driving to 3D creative realism. However, directly estimating disparity from dual pixels is less accurate. This work hypothesizes that distilling high-precision dark stereo knowledge, implicitly or explicitly, to efficient dual-pixel student networks e… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: International Conference of Computational Photography (ICCP 2024), 11 pages and 12 figures

  8. arXiv:2405.05876  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Composable Part-Based Manipulation

    Authors: Weiyu Liu, Jiayuan Mao, Joy Hsu, Tucker Hermans, Animesh Garg, Jiajun Wu

    Abstract: In this paper, we propose composable part-based manipulation (CPM), a novel approach that leverages object-part decomposition and part-part correspondences to improve learning and generalization of robotic manipulation skills. By considering the functional correspondences between object parts, we conceptualize functional actions, such as pouring and constrained placing, as combinations of differen… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Presented at CoRL 2023. For videos and additional results, see our website: https://cpmcorl2023.github.io/

  9. arXiv:2405.05376  [pdf, other

    cs.CL

    Kreyòl-MT: Building MT for Latin American, Caribbean and Colonial African Creole Languages

    Authors: Nathaniel R. Robinson, Raj Dabre, Ammon Shurtz, Rasul Dent, Onenamiyi Onesi, Claire Bizon Monroc, Loïc Grobol, Hasan Muhammad, Ashi Garg, Naome A. Etori, Vijay Murari Tiyyala, Olanrewaju Samuel, Matthew Dean Stutzman, Bismarck Bamfo Odoom, Sanjeev Khudanpur, Stephen D. Richardson, Kenton Murray

    Abstract: A majority of language technologies are tailored for a small number of high-resource languages, while relatively many low-resource languages are neglected. One such group, Creole languages, have long been marginalized in academic study, though their speakers could benefit from machine translation (MT). These languages are predominantly used in much of Latin America, Africa and the Caribbean. We pr… ▽ More

    Submitted 13 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: NAACL 2024

  10. arXiv:2405.05226  [pdf, other

    cs.RO

    SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants

    Authors: Masoud Moghani, Lars Doorenbos, William Chung-Ho Panitch, Sean Huver, Mahdi Azizian, Ken Goldberg, Animesh Garg

    Abstract: In this work, we present SuFIA, the first framework for natural language-guided augmented dexterity for robotic surgical assistants. SuFIA incorporates the strong reasoning capabilities of large language models (LLMs) with perception modules to implement high-level planning and low-level control of a robot for surgical sub-task execution. This enables a learning-free approach to surgical augmented… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  11. arXiv:2405.00732  [pdf, other

    cs.CL cs.AI cs.LG

    LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

    Authors: Justin Zhao, Timothy Wang, Wael Abid, Geoffrey Angus, Arnav Garg, Jeffery Kinnison, Alex Sherstinsky, Piero Molino, Travis Addair, Devvret Rishi

    Abstract: Low Rank Adaptation (LoRA) has emerged as one of the most widely adopted methods for Parameter Efficient Fine-Tuning (PEFT) of Large Language Models (LLMs). LoRA reduces the number of trainable parameters and memory usage while achieving comparable performance to full fine-tuning. We aim to assess the viability of training and serving LLMs fine-tuned with LoRA in real-world applications. First, we… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

  12. arXiv:2404.16027  [pdf, other

    cs.RO

    ORBIT-Surgical: An Open-Simulation Framework for Learning Surgical Augmented Dexterity

    Authors: Qinxi Yu, Masoud Moghani, Karthik Dharmarajan, Vincent Schorp, William Chung-Ho Panitch, Jingzhou Liu, Kush Hari, Huang Huang, Mayank Mittal, Ken Goldberg, Animesh Garg

    Abstract: Physics-based simulations have accelerated progress in robot learning for driving, manipulation, and locomotion. Yet, a fast, accurate, and robust surgical simulation environment remains a challenge. In this paper, we present ORBIT-Surgical, a physics-based surgical robot simulation framework with photorealistic rendering in NVIDIA Omniverse. We provide 14 benchmark surgical tasks for the da Vinci… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  13. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  14. arXiv:2404.07428  [pdf, other

    cs.RO cs.LG

    AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent

    Authors: Tongzhou Mu, Yijie Guo, Jie Xu, Ankit Goyal, Hao Su, Dieter Fox, Animesh Garg

    Abstract: Encouraged by the remarkable achievements of language and vision foundation models, developing generalist robotic agents through imitation learning, using large demonstration datasets, has become a prominent area of interest in robot learning. The efficacy of imitation learning is heavily reliant on the quantity and quality of the demonstration datasets. In this study, we aim to scale up demonstra… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  15. arXiv:2404.04603  [pdf, ps, other

    cs.HC cs.CY

    Analyzing LLM Usage in an Advanced Computing Class in India

    Authors: Anupam Garg, Aryaman Raina, Aryan Gupta, Jaskaran Singh, Manav Saini, Prachi Iiitd, Ronit Mehta, Rupin Oberoi, Sachin Sharma, Samyak Jain, Sarthak Tyagi, Utkarsh Arora, Dhruv Kumar

    Abstract: This study examines the use of large language models (LLMs) by undergraduate and graduate students for programming assignments in advanced computing classes. Unlike existing research, which primarily focuses on introductory classes and lacks in-depth analysis of actual student-LLM interactions, our work fills this gap. We conducted a comprehensive analysis involving 411 students from a Distributed… ▽ More

    Submitted 26 July, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

    Comments: Under review: 8 pages

  16. arXiv:2404.01339  [pdf, other

    cs.CL cs.AI cs.HC

    Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation

    Authors: Rohan Chaudhury, Mihir Godbole, Aakash Garg, Jinsil Hwaryoung Seo

    Abstract: Contemporary conversational systems often present a significant limitation: their responses lack the emotional depth and disfluent characteristic of human interactions. This absence becomes particularly noticeable when users seek more personalized and empathetic interactions. Consequently, this makes them seem mechanical and less relatable to human users. Recognizing this gap, we embarked on a jou… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 10 pages, 1 figure, for associated code and media files, see https://github.com/Rohan-Chaudhury/Humane-Speech-Synthesis-through-Zero-Shot-Emotion-and-Disfluency-Generation

  17. arXiv:2404.00597  [pdf, other

    cs.CV

    Parameter and Data-Efficient Spectral StyleDCGAN

    Authors: Aryan Garg

    Abstract: We present a simple, highly parameter, and data-efficient adversarial network for unconditional face generation. Our method: Spectral Style-DCGAN or SSD utilizes only 6.574 million parameters and 4739 dog faces from the Animal Faces HQ (AFHQ) dataset as training samples while preserving fidelity at low resolutions up to 64x64. Code available at https://github.com/Aryan-Garg/StyleDCGAN.

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: Notable ICLR Tiny Paper 2024

  18. arXiv:2403.20116  [pdf, other

    cs.RO

    LeGo-Drive: Language-enhanced Goal-oriented Closed-Loop End-to-End Autonomous Driving

    Authors: Pranjal Paul, Anant Garg, Tushar Choudhary, Arun Kumar Singh, K. Madhava Krishna

    Abstract: Existing Vision-Language models (VLMs) estimate either long-term trajectory waypoints or a set of control actions as a reactive solution for closed-loop planning based on their rich scene comprehension. However, these estimations are coarse and are subjective to their "world understanding" which may generate sub-optimal decisions due to perception errors. In this paper, we introduce LeGo-Drive, wh… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  19. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  20. arXiv:2402.15650  [pdf, ps, other

    cs.LG cs.AI

    Multi-Constraint Safe RL with Objective Suppression for Safety-Critical Applications

    Authors: Zihan Zhou, Jonathan Booher, Khashayar Rohanimanesh, Wei Liu, Aleksandr Petiushko, Animesh Garg

    Abstract: Safe reinforcement learning tasks with multiple constraints are a challenging domain despite being very common in the real world. In safety-critical domains, properly handling the constraints becomes even more important. To address this challenge, we first describe the multi-constraint problem with a stronger Uniformly Constrained MDP (UCMDP) model; we then propose Objective Suppression, a novel m… ▽ More

    Submitted 15 April, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  21. arXiv:2402.02612  [pdf, other

    cs.RO

    Fast Explicit-Input Assistance for Teleoperation in Clutter

    Authors: Nick Walker, Xuning Yang, Animesh Garg, Maya Cakmak, Dieter Fox, Claudia Pérez-D'Arpino

    Abstract: The performance of prediction-based assistance for robot teleoperation degrades in unseen or goal-rich environments due to incorrect or quickly-changing intent inferences. Poor predictions can confuse operators or cause them to change their control input to implicitly signal their goal. We present a new assistance interface for robotic manipulation where an operator can explicitly communicate a ma… ▽ More

    Submitted 2 April, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  22. arXiv:2401.17791  [pdf, other

    cs.LG cs.AI

    Graph Transformers without Positional Encodings

    Authors: Ayush Garg

    Abstract: Recently, Transformers for graph representation learning have become increasingly popular, achieving state-of-the-art performance on a wide-variety of graph datasets, either alone or in combination with message-passing graph neural networks (MP-GNNs). Infusing graph inductive-biases in the innately structure-agnostic transformer architecture in the form of structural or positional encodings (PEs)… ▽ More

    Submitted 6 May, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: Independent Research

  23. arXiv:2401.10465  [pdf, other

    cs.CL cs.SD eess.AS

    Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech

    Authors: Abhinav Garg, Jiyeon Kim, Sushil Khyalia, Chanwoo Kim, Dhananjaya Gowda

    Abstract: Grapheme-to-Phoneme (G2P) is an essential first step in any modern, high-quality Text-to-Speech (TTS) system. Most of the current G2P systems rely on carefully hand-crafted lexicons developed by experts. This poses a two-fold problem. Firstly, the lexicons are generated using a fixed phoneme set, usually, ARPABET or IPA, which might not be the most optimal way to represent phonemes for all languag… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  24. Exploring Content-Based and Meta-Data Analysis for Detecting Fake News Infodemic: A case study on COVID-19

    Authors: Oluwaseun Ajao, Ashish Garg, Marjory Da Costa-Abreu

    Abstract: The coronavirus pandemic (COVID-19) is probably the most disruptive global health disaster in recent history. It negatively impacted the whole world and virtually brought the global economy to a standstill. However, as the virus was spreading, infecting people and claiming thousands of lives so was the spread and propagation of fake news, misinformation and disinformation about the event. These in… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 8 pages, 5 figures, 3 tables, International Conference for Pattern Recognition Systems (ICPRS 2022)

    ACM Class: H.3.3

    Journal ref: In ICPRS 2022 (pp. 1-8). IEEE

  25. arXiv:2401.06949  [pdf, other

    cs.RO cs.AI

    ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization

    Authors: Kourosh Darvish, Marta Skreta, Yuchi Zhao, Naruki Yoshikawa, Sagnik Som, Miroslav Bogdanovic, Yang Cao, Han Hao, Haoping Xu, Alán Aspuru-Guzik, Animesh Garg, Florian Shkurti

    Abstract: Chemistry experimentation is often resource- and labor-intensive. Despite the many benefits incurred by the integration of advanced and special-purpose lab equipment, many aspects of experimentation are still manually conducted by chemists, for example, polishing an electrode in electrochemistry experiments. Traditional lab automation infrastructure faces challenges when it comes to flexibly adapt… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  26. arXiv:2401.04157  [pdf, other

    cs.RO

    RePLan: Robotic Replanning with Perception and Language Models

    Authors: Marta Skreta, Zihan Zhou, Jia Lin Yuan, Kourosh Darvish, Alán Aspuru-Guzik, Animesh Garg

    Abstract: Advancements in large language models (LLMs) have demonstrated their potential in facilitating high-level reasoning, logical reasoning and robotics planning. Recently, LLMs have also been able to generate reward functions for low-level robot actions, effectively bridging the interface between high-level planning and low-level robot control. However, the challenge remains that even with syntactical… ▽ More

    Submitted 20 February, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  27. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  28. arXiv:2312.10728  [pdf, other

    cs.AI

    Benchmarks for Physical Reasoning AI

    Authors: Andrew Melnik, Robin Schiewer, Moritz Lange, Andrei Muresanu, Mozhgan Saeidi, Animesh Garg, Helge Ritter

    Abstract: Physical reasoning is a crucial aspect in the development of general AI systems, given that human learning starts with interacting with the physical world before progressing to more complex concepts. Although researchers have studied and assessed the physical reasoning of AI approaches through various specific benchmarks, there is no comprehensive approach to evaluating and measuring progress. The… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

  29. arXiv:2312.06134  [pdf, other

    cs.CL cs.LG

    Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

    Authors: Dami Choi, Derrick Xin, Hamid Dadkhahi, Justin Gilmer, Ankush Garg, Orhan Firat, Chih-Kuan Yeh, Andrew M. Dai, Behrooz Ghorbani

    Abstract: In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance. We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks. We provide a thorough empirical study and analysis of this method's be… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  30. arXiv:2312.03864  [pdf, other

    cs.RO

    Geometry Matching for Multi-Embodiment Grasping

    Authors: Maria Attarian, Muhammad Adil Asif, Jingzhou Liu, Ruthrash Hari, Animesh Garg, Igor Gilitschenski, Jonathan Tompson

    Abstract: Many existing learning-based grasping approaches concentrate on a single embodiment, provide limited generalization to higher DoF end-effectors and cannot capture a diverse set of grasp modes. We tackle the problem of grasping using multiple embodiments by learning rich geometric representations for both objects and end-effectors using Graph Neural Networks. Our novel method - GeoMatch - applies s… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Journal ref: 7th Annual Conference on Robot Learning, 2023

  31. arXiv:2311.16552  [pdf, other

    cs.CV cs.RO

    HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors

    Authors: Shutong Zhang, Yi-Ling Qiao, Guanglei Zhu, Eric Heiden, Dylan Turpin, Jingzhou Liu, Ming Lin, Miles Macklin, Animesh Garg

    Abstract: Various heuristic objectives for modeling hand-object interaction have been proposed in past work. However, due to the lack of a cohesive framework, these objectives often possess a narrow scope of applicability and are limited by their efficiency or accuracy. In this paper, we propose HandyPriors, a unified and general pipeline for pose estimation in human-object interaction scenes by leveraging… ▽ More

    Submitted 26 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

  32. arXiv:2311.10741  [pdf

    cs.CY

    Data Equity: Foundational Concepts for Generative AI

    Authors: JoAnn Stonier, Lauren Woodman, Majed Alshammari, Renée Cummings, Nighat Dad, Arti Garg, Alberto Giovanni Busetto, Katherine Hsiao, Maui Hudson, Parminder Jeet Singh, David Kanamugire, Astha Kapoor, Zheng Lei, Jacqueline Lu, Emna Mizouni, Angela Oduor Lungati, María Paz Canales Loebel, Arathi Sethumadhavan, Sarah Telford, Supheakmungkol Sarin, Kimmy Bettinger, Stephanie Teeuwen

    Abstract: This briefing paper focuses on data equity within foundation models, both in terms of the impact of Generative AI (genAI) on society and on the further development of genAI tools. GenAI promises immense potential to drive digital and social innovation, such as improving efficiency, enhancing creativity and augmenting existing data. GenAI has the potential to democratize access and usage of technol… ▽ More

    Submitted 27 October, 2023; originally announced November 2023.

    Journal ref: World Economic Forum 2023

  33. arXiv:2311.07284  [pdf, ps, other

    cs.DS cs.CC cs.LG

    Learning Arithmetic Formulas in the Presence of Noise: A General Framework and Applications to Unsupervised Learning

    Authors: Pritam Chandra, Ankit Garg, Neeraj Kayal, Kunal Mittal, Tanmay Sinha

    Abstract: We present a general framework for designing efficient algorithms for unsupervised learning problems, such as mixtures of Gaussians and subspace clustering. Our framework is based on a meta algorithm that learns arithmetic circuits in the presence of noise, using lower bounds. This builds upon the recent work of Garg, Kayal and Saha (FOCS 20), who designed such a framework for learning arithmetic… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: 85 pages, comments welcome

  34. arXiv:2311.02536  [pdf, other

    cs.CV

    Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models

    Authors: Jingru Yi, Burak Uzkent, Oana Ignat, Zili Li, Amanmeet Garg, Xiang Yu, Linda Liu

    Abstract: Grounding-based vision and language models have been successfully applied to low-level vision tasks, aiming to precisely locate objects referred in captions. The effectiveness of grounding representation learning heavily relies on the scale of the training dataset. Despite being a useful data enrichment strategy, data augmentation has received minimal attention in existing vision and language task… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: Accepted to WACV2024

  35. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  36. arXiv:2308.15705  [pdf

    cs.CV

    Towards Earlier Detection of Oral Diseases On Smartphones Using Oral and Dental RGB Images

    Authors: Ayush Garg, Julia Lu, Anika Maji

    Abstract: Oral diseases such as periodontal (gum) diseases and dental caries (cavities) affect billions of people across the world today. However, previous state-of-the-art models have relied on X-ray images to detect oral diseases, making them inaccessible to remote monitoring, developing countries, and telemedicine. To combat this overuse of X-ray imagery, we propose a lightweight machine learning model c… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: 10 pages, 6 figures, 1 formula. This research was conducted as a mentored project performed for a college course and research program at the University of California Santa Barbara's Summer Research Academies program

  37. arXiv:2308.07286  [pdf, other

    cs.CL cs.LG

    The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation

    Authors: Patrick Fernandes, Daniel Deutsch, Mara Finkelstein, Parker Riley, André F. T. Martins, Graham Neubig, Ankush Garg, Jonathan H. Clark, Markus Freitag, Orhan Firat

    Abstract: Automatic evaluation of machine translation (MT) is a critical tool driving the rapid iterative development of MT systems. While considerable progress has been made on estimating a single scalar quality score, current metrics lack the informativeness of more detailed schemes that annotate individual errors, such as Multidimensional Quality Metrics (MQM). In this paper, we help fill this gap by pro… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: 19 pages

  38. arXiv:2307.12964  [pdf, other

    cs.CV

    Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment

    Authors: Sarah Ibrahimi, Xiaohang Sun, Pichao Wang, Amanmeet Garg, Ashutosh Sanan, Mohamed Omar

    Abstract: Text-to-video retrieval systems have recently made significant progress by utilizing pre-trained models trained on large-scale image-text pairs. However, most of the latest methods primarily focus on the video modality while disregarding the audio signal for this task. Nevertheless, a recent advancement by ECLIPSE has improved long-range text-to-video retrieval by developing an audiovisual video r… ▽ More

    Submitted 18 October, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023

  39. arXiv:2306.08132  [pdf, other

    cs.RO

    Fast-Grasp'D: Dexterous Multi-finger Grasp Generation Through Differentiable Simulation

    Authors: Dylan Turpin, Tao Zhong, Shutong Zhang, Guanglei Zhu, Jingzhou Liu, Ritvik Singh, Eric Heiden, Miles Macklin, Stavros Tsogkas, Sven Dickinson, Animesh Garg

    Abstract: Multi-finger grasping relies on high quality training data, which is hard to obtain: human data is hard to transfer and synthetic data relies on simplifying assumptions that reduce grasp quality. By making grasp simulation differentiable, and contact dynamics amenable to gradient-based optimization, we accelerate the search for high-quality grasps with fewer limiting assumptions. We present Grasp'… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  40. arXiv:2306.07179  [pdf, other

    cs.LG stat.ML

    Benchmarking Neural Network Training Algorithms

    Authors: George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa Eschenhagen, Priya Kasimbeg, Daniel Suo, Juhan Bae, Justin Gilmer, Abel L. Peirson, Bilal Khan, Rohan Anil, Mike Rabbat, Shankar Krishnan, Daniel Snider, Ehsan Amid, Kongtao Chen, Chris J. Maddison, Rakshith Vasudev, Michal Badura, Ankush Garg, Peter Mattson

    Abstract: Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning rate schedules, or data selection schemes) could save time, save computational resources, and lead to better, more accurate, models. Unfortunately, as a communi… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: 102 pages, 8 figures, 41 tables

  41. arXiv:2305.19486  [pdf, other

    cs.CV

    Instance-dependent Noisy-label Learning with Graphical Model Based Noise-rate Estimation

    Authors: Arpit Garg, Cuong Nguyen, Rafael Felix, Thanh-Toan Do, Gustavo Carneiro

    Abstract: Deep learning faces a formidable challenge when handling noisy labels, as models tend to overfit samples affected by label noise. This challenge is further compounded by the presence of instance-dependent noise (IDN), a realistic form of label noise arising from ambiguous sample information. To address IDN, Label Noise Learning (LNL) incorporates a sample selection stage to differentiate clean and… ▽ More

    Submitted 4 July, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: ECCV 2024

  42. arXiv:2305.17565  [pdf, other

    cs.CV cs.RO

    Self-Supervised Learning of Action Affordances as Interaction Modes

    Authors: Liquan Wang, Nikita Dvornik, Rafael Dubeau, Mayank Mittal, Animesh Garg

    Abstract: When humans perform a task with an articulated object, they interact with the object only in a handful of ways, while the space of all possible interactions is nearly endless. This is because humans have prior knowledge about what interactions are likely to be successful, i.e., to open a new door we first try the handle. While learning such priors without supervision is easy for humans, it is noto… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Journal ref: 2023 International Conference on Robotics and Automation

  43. arXiv:2305.11281  [pdf, other

    cs.CV cs.LG

    SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models

    Authors: Ziyi Wu, Jingyu Hu, Wuyue Lu, Igor Gilitschenski, Animesh Garg

    Abstract: Object-centric learning aims to represent visual data with a set of object entities (a.k.a. slots), providing structured representations that enable systematic generalization. Leveraging advanced architectures like Transformers, recent approaches have made significant progress in unsupervised object discovery. In addition, slot-based representations hold great potential for generative modeling, su… ▽ More

    Submitted 21 September, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023 Spotlight. Project page: https://slotdiffusion.github.io/

  44. arXiv:2305.00508  [pdf, other

    cs.LG cs.AI

    Learning Achievement Structure for Structured Exploration in Domains with Sparse Reward

    Authors: Zihan Zhou, Animesh Garg

    Abstract: We propose Structured Exploration with Achievements (SEA), a multi-stage reinforcement learning algorithm designed for achievement-based environments, a particular type of environment with an internal achievement set. SEA first uses offline data to learn a representation of the known achievements with a determinant loss function, then recovers the dependency graph of the learned achievements with… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

    Comments: published as a conference paper at ICLR 2023

  45. arXiv:2304.13265  [pdf, other

    cs.CV

    StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos

    Authors: Nikita Dvornik, Isma Hadji, Ran Zhang, Konstantinos G. Derpanis, Animesh Garg, Richard P. Wildes, Allan D. Jepson

    Abstract: Instructional videos are an important resource to learn procedural tasks from human demonstrations. However, the instruction steps in such videos are typically short and sparse, with most of the video being irrelevant to the procedure. This motivates the need to temporally localize the instruction steps in such videos, i.e. the task called key-step localization. Traditional methods for key-step lo… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: CVPR'23

  46. arXiv:2303.14772  [pdf, other

    cs.CV

    $Δ$-Patching: A Framework for Rapid Adaptation of Pre-trained Convolutional Networks without Base Performance Loss

    Authors: Chaitanya Devaguptapu, Samarth Sinha, K J Joseph, Vineeth N Balasubramanian, Animesh Garg

    Abstract: Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time. This process necessitates storing copies of the model over time for each task that the pre-trained model is fine-tuned to. Building on top of recent model patching work, we propose $Δ$-Patching for fine-tuning neural network models in an efficient manner, without the need to s… ▽ More

    Submitted 21 September, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

  47. arXiv:2303.14100  [pdf, other

    cs.RO

    Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting

    Authors: Marta Skreta, Naruki Yoshikawa, Sebastian Arellano-Rubach, Zhi Ji, Lasse Bjørn Kristensen, Kourosh Darvish, Alán Aspuru-Guzik, Florian Shkurti, Animesh Garg

    Abstract: Generating low-level robot task plans from high-level natural language instructions remains a challenging problem. Although large language models have shown promising results in generating plans, the accuracy of the output remains unverified. Furthermore, the lack of domain-specific language data poses a limitation on the applicability of these models. In this paper, we propose CLAIRIFY, a novel a… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  48. arXiv:2303.12536  [pdf

    cs.CR

    BlockChain and Decentralized Apps

    Authors: Aakash Garg, Ankit Tyagi, Anant Patel, Divyansh Raj

    Abstract: Blockchain, the backbone of Bitcoin, has recently gained a lot of attention. Blockchain functions as an immutable record that enables decentralized transactions. Blockchain-based applications are sprouting up in a variety of industries, including financial services, reputation systems, and the Internet of Things (IoT), among others. However, many hurdles of blockchain technology, including scalabi… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  49. arXiv:2303.10802  [pdf, other

    cs.CV

    PASS: Peer-Agreement based Sample Selection for training with Noisy Labels

    Authors: Arpit Garg, Cuong Nguyen, Rafael Felix, Thanh-Toan Do, Gustavo Carneiro

    Abstract: The prevalence of noisy-label samples poses a significant challenge in deep learning, inducing overfitting effects. This has, therefore, motivated the emergence of learning with noisy-label (LNL) techniques that focus on separating noisy- and clean-label samples to apply different learning strategies to each group of samples. Current methodologies often rely on the small-loss hypothesis or feature… ▽ More

    Submitted 30 April, 2024; v1 submitted 19 March, 2023; originally announced March 2023.

    Comments: In Submission

  50. arXiv:2303.04247  [pdf, other

    cs.SE cs.CR

    Vulnerability Mimicking Mutants

    Authors: Aayush Garg, Renzo Degiovanni, Mike Papadakis, Yves Le Traon

    Abstract: With the increasing release of powerful language models trained on large code corpus (e.g. CodeBERT was trained on 6.4 million programs), a new family of mutation testing tools has arisen with the promise to generate more "natural" mutants in the sense that the mutated code aims at following the implicit rules and coding conventions typically produced by programmers. In this paper, we study to wha… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: arXiv admin note: text overlap with arXiv:2301.12284