-
Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Authors:
Kiana Ehsani,
Tanmay Gupta,
Rose Hendrix,
Jordi Salvador,
Luca Weihs,
Kuo-Hao Zeng,
Kunal Pratap Singh,
Yejin Kim,
Winson Han,
Alvaro Herrasti,
Ranjay Krishna,
Dustin Schwenk,
Eli VanderBilt,
Aniruddha Kembhavi
Abstract:
Reinforcement learning (RL) with dense rewards and imitation learning (IL) with human-generated trajectories are the most widely used approaches for training modern embodied agents. RL requires extensive reward shaping and auxiliary losses and is often too slow and ineffective for long-horizon tasks. While IL with human supervision is effective, collecting human trajectories at scale is extremely…
▽ More
Reinforcement learning (RL) with dense rewards and imitation learning (IL) with human-generated trajectories are the most widely used approaches for training modern embodied agents. RL requires extensive reward shaping and auxiliary losses and is often too slow and ineffective for long-horizon tasks. While IL with human supervision is effective, collecting human trajectories at scale is extremely expensive. In this work, we show that imitating shortest-path planners in simulation produces agents that, given a language instruction, can proficiently navigate, explore, and manipulate objects in both simulation and in the real world using only RGB sensors (no depth map or GPS coordinates). This surprising result is enabled by our end-to-end, transformer-based, SPOC architecture, powerful visual encoders paired with extensive image augmentation, and the dramatic scale and diversity of our training data: millions of frames of shortest-path-expert trajectories collected inside approximately 200,000 procedurally generated houses containing 40,000 unique 3D assets. Our models, data, training code, and newly proposed 10-task benchmarking suite CHORES will be open-sourced.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Authors:
Open X-Embodiment Collaboration,
Abby O'Neill,
Abdul Rehman,
Abhinav Gupta,
Abhiram Maddukuri,
Abhishek Gupta,
Abhishek Padalkar,
Abraham Lee,
Acorn Pooley,
Agrim Gupta,
Ajay Mandlekar,
Ajinkya Jain,
Albert Tung,
Alex Bewley,
Alex Herzog,
Alex Irpan,
Alexander Khazatsky,
Anant Rai,
Anchit Gupta,
Andrew Wang,
Andrey Kolobov,
Anikait Singh,
Animesh Garg,
Aniruddha Kembhavi,
Annie Xie
, et al. (267 additional authors not shown)
Abstract:
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method…
▽ More
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io.
△ Less
Submitted 1 June, 2024; v1 submitted 13 October, 2023;
originally announced October 2023.
-
Equitable-FL: Federated Learning with Sparsity for Resource-Constrained Environment
Authors:
Indrajeet Kumar Sinha,
Shekhar Verma,
Krishna Pratap Singh
Abstract:
In Federated Learning, model training is performed across multiple computing devices, where only parameters are shared with a common central server without exchanging their data instances. This strategy assumes abundance of resources on individual clients and utilizes these resources to build a richer model as user's models. However, when the assumption of the abundance of resources is violated, l…
▽ More
In Federated Learning, model training is performed across multiple computing devices, where only parameters are shared with a common central server without exchanging their data instances. This strategy assumes abundance of resources on individual clients and utilizes these resources to build a richer model as user's models. However, when the assumption of the abundance of resources is violated, learning may not be possible as some nodes may not be able to participate in the process. In this paper, we propose a sparse form of federated learning that performs well in a Resource Constrained Environment. Our goal is to make learning possible, regardless of a node's space, computing, or bandwidth scarcity. The method is based on the observation that model size viz a viz available resources defines resource scarcity, which entails that reduction of the number of parameters without affecting accuracy is key to model training in a resource-constrained environment. In this work, the Lottery Ticket Hypothesis approach is utilized to progressively sparsify models to encourage nodes with resource scarcity to participate in collaborative training. We validate Equitable-FL on the $MNIST$, $F-MNIST$, and $CIFAR-10$ benchmark datasets, as well as the $Brain-MRI$ data and the $PlantVillage$ datasets. Further, we examine the effect of sparsity on performance, model size compaction, and speed-up for training. Results obtained from experiments performed for training convolutional neural networks validate the efficacy of Equitable-FL in heterogeneous resource-constrained learning environment.
△ Less
Submitted 2 September, 2023;
originally announced September 2023.
-
FAM: fast adaptive federated meta-learning
Authors:
Indrajeet Kumar Sinha,
Shekhar Verma,
Krishna Pratap Singh
Abstract:
In this work, we propose a fast adaptive federated meta-learning (FAM) framework for collaboratively learning a single global model, which can then be personalized locally on individual clients. Federated learning enables multiple clients to collaborate to train a model without sharing data. Clients with insufficient data or data diversity participate in federated learning to learn a model with su…
▽ More
In this work, we propose a fast adaptive federated meta-learning (FAM) framework for collaboratively learning a single global model, which can then be personalized locally on individual clients. Federated learning enables multiple clients to collaborate to train a model without sharing data. Clients with insufficient data or data diversity participate in federated learning to learn a model with superior performance. Nonetheless, learning suffers when data distributions diverge. There is a need to learn a global model that can be adapted using client's specific information to create personalized models on clients is required. MRI data suffers from this problem, wherein, one, due to data acquisition challenges, local data at a site is sufficient for training an accurate model and two, there is a restriction of data sharing due to privacy concerns and three, there is a need for personalization of a learnt shared global model on account of domain shift across client sites. The global model is sparse and captures the common features in the MRI. This skeleton network is grown on each client to train a personalized model by learning additional client-specific parameters from local data. Experimental results show that the personalization process at each client quickly converges using a limited number of epochs. The personalized client models outperformed the locally trained models, demonstrating the efficacy of the FAM mechanism. Additionally, the sparse parameter set to be communicated during federated learning drastically reduced communication overhead, which makes the scheme viable for networks with limited resources.
△ Less
Submitted 1 September, 2023; v1 submitted 26 August, 2023;
originally announced August 2023.
-
DenseBAM-GI: Attention Augmented DeneseNet with momentum aided GRU for HMER
Authors:
Aniket Pal,
Krishna Pratap Singh
Abstract:
The task of recognising Handwritten Mathematical Expressions (HMER) is crucial in the fields of digital education and scholarly research. However, it is difficult to accurately determine the length and complex spatial relationships among symbols in handwritten mathematical expressions. In this study, we present a novel encoder-decoder architecture (DenseBAM-GI) for HMER, where the encoder has a Bo…
▽ More
The task of recognising Handwritten Mathematical Expressions (HMER) is crucial in the fields of digital education and scholarly research. However, it is difficult to accurately determine the length and complex spatial relationships among symbols in handwritten mathematical expressions. In this study, we present a novel encoder-decoder architecture (DenseBAM-GI) for HMER, where the encoder has a Bottleneck Attention Module (BAM) to improve feature representation and the decoder has a Gated Input-GRU (GI-GRU) unit with an extra gate to make decoding long and complex expressions easier. The proposed model is an efficient and lightweight architecture with performance equivalent to state-of-the-art models in terms of Expression Recognition Rate (exprate). It also performs better in terms of top 1, 2, and 3 error accuracy across the CROHME 2014, 2016, and 2019 datasets. DenseBAM-GI achieves the best exprate among all models on the CROHME 2019 dataset. Importantly, these successes are accomplished with a drop in the complexity of the calculation and a reduction in the need for GPU memory.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Learning to Learn with Indispensable Connections
Authors:
Sambhavi Tiwari,
Manas Gogoi,
Shekhar Verma,
Krishna Pratap Singh
Abstract:
Meta-learning aims to solve unseen tasks with few labelled instances. Nevertheless, despite its effectiveness for quick learning in existing optimization-based methods, it has several flaws. Inconsequential connections are frequently seen during meta-training, which results in an over-parameterized neural network. Because of this, meta-testing observes unnecessary computations and extra memory ove…
▽ More
Meta-learning aims to solve unseen tasks with few labelled instances. Nevertheless, despite its effectiveness for quick learning in existing optimization-based methods, it has several flaws. Inconsequential connections are frequently seen during meta-training, which results in an over-parameterized neural network. Because of this, meta-testing observes unnecessary computations and extra memory overhead. To overcome such flaws. We propose a novel meta-learning method called Meta-LTH that includes indispensible (necessary) connections. We applied the lottery ticket hypothesis technique known as magnitude pruning to generate these crucial connections that can effectively solve few-shot learning problem. We aim to perform two things: (a) to find a sub-network capable of more adaptive meta-learning and (b) to learn new low-level features of unseen tasks and recombine those features with the already learned features during the meta-test phase. Experimental results show that our proposed Met-LTH method outperformed existing first-order MAML algorithm for three different classification datasets. Our method improves the classification accuracy by approximately 2% (20-way 1-shot task setting) for omniglot dataset.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
A General Purpose Supervisory Signal for Embodied Agents
Authors:
Kunal Pratap Singh,
Jordi Salvador,
Luca Weihs,
Aniruddha Kembhavi
Abstract:
Training effective embodied AI agents often involves manual reward engineering, expert imitation, specialized components such as maps, or leveraging additional sensors for depth and localization. Another approach is to use neural architectures alongside self-supervised objectives which encourage better representation learning. In practice, there are few guarantees that these self-supervised object…
▽ More
Training effective embodied AI agents often involves manual reward engineering, expert imitation, specialized components such as maps, or leveraging additional sensors for depth and localization. Another approach is to use neural architectures alongside self-supervised objectives which encourage better representation learning. In practice, there are few guarantees that these self-supervised objectives encode task-relevant information. We propose the Scene Graph Contrastive (SGC) loss, which uses scene graphs as general-purpose, training-only, supervisory signals. The SGC loss does away with explicit graph decoding and instead uses contrastive learning to align an agent's representation with a rich graphical encoding of its environment. The SGC loss is generally applicable, simple to implement, and encourages representations that encode objects' semantics, relationships, and history. Using the SGC loss, we attain significant gains on three embodied tasks: Object Navigation, Multi-Object Navigation, and Arm Point Navigation. Finally, we present studies and analyses which demonstrate the ability of our trained representation to encode semantic cues about the environment.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Ask4Help: Learning to Leverage an Expert for Embodied Tasks
Authors:
Kunal Pratap Singh,
Luca Weihs,
Alvaro Herrasti,
Jonghyun Choi,
Aniruddha Kemhavi,
Roozbeh Mottaghi
Abstract:
Embodied AI agents continue to become more capable every year with the advent of new models, environments, and benchmarks, but are still far away from being performant and reliable enough to be deployed in real, user-facing, applications. In this paper, we ask: can we bridge this gap by enabling agents to ask for assistance from an expert such as a human being? To this end, we propose the Ask4Help…
▽ More
Embodied AI agents continue to become more capable every year with the advent of new models, environments, and benchmarks, but are still far away from being performant and reliable enough to be deployed in real, user-facing, applications. In this paper, we ask: can we bridge this gap by enabling agents to ask for assistance from an expert such as a human being? To this end, we propose the Ask4Help policy that augments agents with the ability to request, and then use expert assistance. Ask4Help policies can be efficiently trained without modifying the original agent's parameters and learn a desirable trade-off between task performance and the amount of requested help, thereby reducing the cost of querying the expert. We evaluate Ask4Help on two different tasks -- object goal navigation and room rearrangement and see substantial improvements in performance using minimal help. On object navigation, an agent that achieves a $52\%$ success rate is raised to $86\%$ with $13\%$ help and for rearrangement, the state-of-the-art model with a $7\%$ success rate is dramatically improved to $90.4\%$ using $39\%$ help. Human trials with Ask4Help demonstrate the efficacy of our approach in practical scenarios. We release the code for Ask4Help here: https://github.com/allenai/ask4help.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
MAC: A Meta-Learning Approach for Feature Learning and Recombination
Authors:
S. Tiwari,
M. Gogoi,
S. Verma,
K. P. Singh
Abstract:
Optimization-based meta-learning aims to learn an initialization so that a new unseen task can be learned within a few gradient updates. Model Agnostic Meta-Learning (MAML) is a benchmark algorithm comprising two optimization loops. The inner loop is dedicated to learning a new task and the outer loop leads to meta-initialization. However, ANIL (almost no inner loop) algorithm shows that feature r…
▽ More
Optimization-based meta-learning aims to learn an initialization so that a new unseen task can be learned within a few gradient updates. Model Agnostic Meta-Learning (MAML) is a benchmark algorithm comprising two optimization loops. The inner loop is dedicated to learning a new task and the outer loop leads to meta-initialization. However, ANIL (almost no inner loop) algorithm shows that feature reuse is an alternative to rapid learning in MAML. Thus, the meta-initialization phase makes MAML primed for feature reuse and obviates the need for rapid learning. Contrary to ANIL, we hypothesize that there may be a need to learn new features during meta-testing. A new unseen task from non-similar distribution would necessitate rapid learning in addition reuse and recombination of existing features. In this paper, we invoke the width-depth duality of neural networks, wherein, we increase the width of the network by adding extra computational units (ACU). The ACUs enable the learning of new atomic features in the meta-testing task, and the associated increased width facilitates information propagation in the forwarding pass. The newly learnt features combine with existing features in the last layer for meta-learning. Experimental results show that our proposed MAC method outperformed existing ANIL algorithm for non-similar task distribution by approximately 13% (5-shot task setting)
△ Less
Submitted 24 May, 2024; v1 submitted 20 September, 2022;
originally announced September 2022.
-
BNAS v2: Learning Architectures for Binary Networks with Empirical Improvements
Authors:
Dahyun Kim,
Kunal Pratap Singh,
Jonghyun Choi
Abstract:
Backbone architectures of most binary networks are well-known floating point (FP) architectures such as the ResNet family. Questioning that the architectures designed for FP networks might not be the best for binary networks, we propose to search architectures for binary networks (BNAS) by defining a new search space for binary architectures and a novel search objective. Specifically, based on the…
▽ More
Backbone architectures of most binary networks are well-known floating point (FP) architectures such as the ResNet family. Questioning that the architectures designed for FP networks might not be the best for binary networks, we propose to search architectures for binary networks (BNAS) by defining a new search space for binary architectures and a novel search objective. Specifically, based on the cell based search method, we define the new search space of binary layer types, design a new cell template, and rediscover the utility of and propose to use the Zeroise layer instead of using it as a placeholder. The novel search objective diversifies early search to learn better performing binary architectures. We show that our method searches architectures with stable training curves despite the quantization error inherent in binary networks. Quantitative analyses demonstrate that our searched architectures outperform the architectures used in state-of-the-art binary networks and outperform or perform on par with state-of-the-art binary networks that employ various techniques other than architectural changes. In addition, we further propose improvements to the training scheme of our searched architectures. With the new training scheme for our searched architectures, we achieve the state-of-the-art performance by binary networks by outperforming all previous methods by non-trivial margins.
△ Less
Submitted 16 October, 2021;
originally announced October 2021.
-
Signature Verification using Geometrical Features and Artificial Neural Network Classifier
Authors:
Anamika Jain,
Satish Kumar Singh,
Krishna Pratap Singh
Abstract:
Signature verification has been one of the major researched areas in the field of computer vision. Many financial and legal organizations use signature verification as access control and authentication. Signature images are not rich in texture; however, they have much vital geometrical information. Through this work, we have proposed a signature verification methodology that is simple yet effectiv…
▽ More
Signature verification has been one of the major researched areas in the field of computer vision. Many financial and legal organizations use signature verification as access control and authentication. Signature images are not rich in texture; however, they have much vital geometrical information. Through this work, we have proposed a signature verification methodology that is simple yet effective. The technique presented in this paper harnesses the geometrical features of a signature image like center, isolated points, connected components, etc., and with the power of Artificial Neural Network (ANN) classifier, classifies the signature image based on their geometrical features. Publicly available dataset MCYT, BHSig260 (contains the image of two regional languages Bengali and Hindi) has been used in this paper to test the effectiveness of the proposed method. We have received a lower Equal Error Rate (EER) on MCYT 100 dataset and higher accuracy on the BHSig260 dataset.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
Factorizing Perception and Policy for Interactive Instruction Following
Authors:
Kunal Pratap Singh,
Suvaansh Bhambri,
Byeonghwi Kim,
Roozbeh Mottaghi,
Jonghyun Choi
Abstract:
Performing simple household tasks based on language directives is very natural to humans, yet it remains an open challenge for AI agents. The 'interactive instruction following' task attempts to make progress towards building agents that jointly navigate, interact, and reason in the environment at every step. To address the multifaceted problem, we propose a model that factorizes the task into int…
▽ More
Performing simple household tasks based on language directives is very natural to humans, yet it remains an open challenge for AI agents. The 'interactive instruction following' task attempts to make progress towards building agents that jointly navigate, interact, and reason in the environment at every step. To address the multifaceted problem, we propose a model that factorizes the task into interactive perception and action policy streams with enhanced components and name it as MOCA, a Modular Object-Centric Approach. We empirically validate that MOCA outperforms prior arts by significant margins on the ALFRED benchmark with improved generalization.
△ Less
Submitted 2 September, 2021; v1 submitted 6 December, 2020;
originally announced December 2020.
-
Learning Architectures for Binary Networks
Authors:
Dahyun Kim,
Kunal Pratap Singh,
Jonghyun Choi
Abstract:
Backbone architectures of most binary networks are well-known floating point architectures such as the ResNet family. Questioning that the architectures designed for floating point networks would not be the best for binary networks, we propose to search architectures for binary networks (BNAS) by defining a new search space for binary architectures and a novel search objective. Specifically, based…
▽ More
Backbone architectures of most binary networks are well-known floating point architectures such as the ResNet family. Questioning that the architectures designed for floating point networks would not be the best for binary networks, we propose to search architectures for binary networks (BNAS) by defining a new search space for binary architectures and a novel search objective. Specifically, based on the cell based search method, we define the new search space of binary layer types, design a new cell template, and rediscover the utility of and propose to use the Zeroise layer instead of using it as a placeholder. The novel search objective diversifies early search to learn better performing binary architectures. We show that our proposed method searches architectures with stable training curves despite the quantization error inherent in binary networks. Quantitative analyses demonstrate that our searched architectures outperform the architectures used in state-of-the-art binary networks and outperform or perform on par with state-of-the-art binary networks that employ various techniques other than architectural changes.
△ Less
Submitted 10 April, 2020; v1 submitted 17 February, 2020;
originally announced February 2020.
-
Design of High Performance MIPS Cryptography Processor Based on T-DES Algorithm
Authors:
Kirat Pal Singh,
Shivani Parmar
Abstract:
The paper describes the design of high performance MIPS Cryptography processor based on triple data encryption standard. The organization of pipeline stages in such a way that pipeline can be clocked at high frequency. Encryption and Decryption blocks of triple data encryption standard (T-DES) crypto system and dependency among themselves are explained in detail with the help of block diagram. In…
▽ More
The paper describes the design of high performance MIPS Cryptography processor based on triple data encryption standard. The organization of pipeline stages in such a way that pipeline can be clocked at high frequency. Encryption and Decryption blocks of triple data encryption standard (T-DES) crypto system and dependency among themselves are explained in detail with the help of block diagram. In order to increase the processor functionality and performance, especially for security applications we include three new 32-bit instructions LKLW, LKUW and CRYPT. The design has been synthesized at 40nm process technology targeting using Xilinx Virtex-6 device. The overall MIPS Crypto processor works at 209MHz.
△ Less
Submitted 8 March, 2015;
originally announced March 2015.
-
Efficient Hardware Design and Implementation of Encrypted MIPS Processor
Authors:
Kirat Pal Singh,
Dilip Kumar
Abstract:
The paper describes the design and hardware implementation of 32-bit encrypted MIPS processor based on MIPS pipeline architecture. The organization of pipeline stages in such a way that pipeline can be clocked at high frequency. Encryption and Decryption blocks of data encryption standard (DES) cryptosystem and dependency among themselves are explained in detail with the help of block diagram. In…
▽ More
The paper describes the design and hardware implementation of 32-bit encrypted MIPS processor based on MIPS pipeline architecture. The organization of pipeline stages in such a way that pipeline can be clocked at high frequency. Encryption and Decryption blocks of data encryption standard (DES) cryptosystem and dependency among themselves are explained in detail with the help of block diagram. In order to increase the processor functionality and performance, especially for security applications we include three new instructions 32-bit LKLW, LKUW and CRYPT. The design has been synthesized at 40nm process technology targeting using Xilinx Virtex-6 device. The encrypted MIPS pipeline processor can work at 218MHz at synthesis level and 744MHz at simulation level.
△ Less
Submitted 8 March, 2015;
originally announced March 2015.
-
Performance Evaluation of Low Power MIPS Crypto Processor based on Cryptography Algorithms
Authors:
Kirat Pal Singh,
Dilip Kumar
Abstract:
This paper presents the design and implementation of low power 32-bit encrypted and decrypted MIPS processor for Data Encryption Standard (DES), Triple DES, Advanced Encryption Standard (AES) based on MIPS pipeline architecture. The organization of pipeline stages has been done in such a way that pipeline can be clocked at high frequency. Encryption and Decryption blocks of three standard cryptogr…
▽ More
This paper presents the design and implementation of low power 32-bit encrypted and decrypted MIPS processor for Data Encryption Standard (DES), Triple DES, Advanced Encryption Standard (AES) based on MIPS pipeline architecture. The organization of pipeline stages has been done in such a way that pipeline can be clocked at high frequency. Encryption and Decryption blocks of three standard cryptography algorithms on MIPS processor and dependency among themselves are explained in detail with the help of a block diagram. Clock gating technique is used to reduce the power consumption in MIPS crypto processor. This approach results in processor that meets power consumption and performance specification for security applications. Proposed Implementation approach concludes higher system performance while reducing operating power consumption. Testing results shows that the MIPS crypto processor operates successfully at a working frequency of 218MHz and a bandwidth of 664Mbits/s.
△ Less
Submitted 8 June, 2013;
originally announced June 2013.
-
Performance Evaluation of Enhanced Interior Gateway Routing Protocol in IPv6 Network
Authors:
Kuwar Pratap Singh,
P K Gupta,
G Singh
Abstract:
With the explosive growth in communication and network technologies, there is a great demand of IPv6 addressing scheme. However, the modern operating systems has option for this and with the development of IPv6 which removes the limitations imposed by IPv4 and provides the large number of address space. In this paper, authors have considered the Enhanced Interior Gateway Routing Protocol and prese…
▽ More
With the explosive growth in communication and network technologies, there is a great demand of IPv6 addressing scheme. However, the modern operating systems has option for this and with the development of IPv6 which removes the limitations imposed by IPv4 and provides the large number of address space. In this paper, authors have considered the Enhanced Interior Gateway Routing Protocol and presented a scenario for its performance evaluation in IPv6 networks and obtained results are highly considerable for the short distance of communication and don't represent any problem of performance degradation while sending or receiving the data.
△ Less
Submitted 18 May, 2013;
originally announced May 2013.