Search | arXiv e-print repository

Pick-or-Mix: Dynamic Channel Sampling for ConvNets

Authors: Ashish Kumar, Daneul Kim, Jaesik Park, Laxmidhar Behera

Abstract: Channel pruning approaches for convolutional neural networks (ConvNets) deactivate the channels, statically or dynamically, and require special implementation. In addition, channel squeezing in representative ConvNets is carried out via 1x1 convolutions which dominates a large portion of computations and network parameters. Given these challenges, we propose an effective multi-purpose module for d… ▽ More Channel pruning approaches for convolutional neural networks (ConvNets) deactivate the channels, statically or dynamically, and require special implementation. In addition, channel squeezing in representative ConvNets is carried out via 1x1 convolutions which dominates a large portion of computations and network parameters. Given these challenges, we propose an effective multi-purpose module for dynamic channel sampling, namely Pick-or-Mix (PiX), which does not require special implementation. PiX divides a set of channels into subsets and then picks from them, where the picking decision is dynamically made per each pixel based on the input activations. We plug PiX into prominent ConvNet architectures and verify its multi-purpose utilities. After replacing 1x1 channel squeezing layers in ResNet with PiX, the network becomes 25% faster without losing accuracy. We show that PiX allows ConvNets to learn better data representation than widely adopted approaches to enhance networks' representation power (e.g., SE, CBAM, AFF, SKNet, and DWP). We also show that PiX achieves state-of-the-art performance on network downscaling and dynamic channel pruning applications. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: Published in Computer Vision and Pattern Recognition (CVPR 2024)

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

arXiv:2402.14591 [pdf, other]

High-Speed Detector For Low-Powered Devices In Aerial Grasping

Authors: Ashish Kumar, Laxmidhar Behera

Abstract: Autonomous aerial harvesting is a highly complex problem because it requires numerous interdisciplinary algorithms to be executed on mini low-powered computing devices. Object detection is one such algorithm that is compute-hungry. In this context, we make the following contributions: (i) Fast Fruit Detector (FFD), a resource-efficient, single-stage, and postprocessing-free object detector based o… ▽ More Autonomous aerial harvesting is a highly complex problem because it requires numerous interdisciplinary algorithms to be executed on mini low-powered computing devices. Object detection is one such algorithm that is compute-hungry. In this context, we make the following contributions: (i) Fast Fruit Detector (FFD), a resource-efficient, single-stage, and postprocessing-free object detector based on our novel latent object representation (LOR) module, query assignment, and prediction strategy. FFD achieves 100FPS@FP32 precision on the latest 10W NVIDIA Jetson-NX embedded device while co-existing with other time-critical sub-systems such as control, grasping, SLAM, a major achievement of this work. (ii) a method to generate vast amounts of training data without exhaustive manual labelling of fruit images since they consist of a large number of instances, which increases the labelling cost and time. (iii) an open-source fruit detection dataset having plenty of very small-sized instances that are difficult to detect. Our exhaustive evaluations on our and MinneApple dataset show that FFD, being only a single-scale detector, is more accurate than many representative detectors, e.g. FFD is better than single-scale Faster-RCNN by 10.7AP, multi-scale Faster-RCNN by 2.3AP, and better than latest single-scale YOLO-v8 by 8AP and multi-scale YOLO-v8 by 0.3 while being considerably faster. △ Less

Submitted 1 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: 8 Pages, 9 Figures, 8 Tables, IEEE Robotics and Automation Letters (IEEE RA-L)

arXiv:2312.16741 [pdf, other]

Bin-picking of novel objects through category-agnostic-segmentation: RGB matters

Authors: Prem Raj, Sachin Bhadang, Gaurav Chaudhary, Laxmidhar Behera, Tushar Sandhan

Abstract: This paper addresses category-agnostic instance segmentation for robotic manipulation, focusing on segmenting objects independent of their class to enable versatile applications like bin-picking in dynamic environments. Existing methods often lack generalizability and object-specific information, leading to grasp failures. We present a novel approach leveraging object-centric instance segmentation… ▽ More This paper addresses category-agnostic instance segmentation for robotic manipulation, focusing on segmenting objects independent of their class to enable versatile applications like bin-picking in dynamic environments. Existing methods often lack generalizability and object-specific information, leading to grasp failures. We present a novel approach leveraging object-centric instance segmentation and simulation-based training for effective transfer to real-world scenarios. Notably, our strategy overcomes challenges posed by noisy depth sensors, enhancing the reliability of learning. Our solution accommodates transparent and semi-transparent objects which are historically difficult for depth-based grasping methods. Contributions include domain randomization for successful transfer, our collected dataset for warehouse applications, and an integrated framework for efficient bin-picking. Our trained instance segmentation model achieves state-of-the-art performance over WISDOM public benchmark [1] and also over the custom-created dataset. In a real-world challenging bin-picking setup our bin-picking framework method achieves 98% accuracy for opaque objects and 97% accuracy for non-opaque objects, outperforming the state-of-the-art baselines with a greater margin. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: Presented at IEEE International Conference on Robotic Computing (IRC), 2023

arXiv:2312.12637 [pdf, other]

Domain-Independent Disperse and Pick method for Robotic Grasping

Authors: Prem Raj, Aniruddha Singhal, Vipul Sanap, L. Behera, Rajesh Sinha

Abstract: Picking unseen objects from clutter is a difficult problem because of the variability in objects (shape, size, and material) and occlusion due to clutter. As a result, it becomes difficult for grasping methods to segment the objects properly and they fail to singulate the object to be picked. This may result in grasp failure or picking of multiple objects together in a single attempt. A push-to-mo… ▽ More Picking unseen objects from clutter is a difficult problem because of the variability in objects (shape, size, and material) and occlusion due to clutter. As a result, it becomes difficult for grasping methods to segment the objects properly and they fail to singulate the object to be picked. This may result in grasp failure or picking of multiple objects together in a single attempt. A push-to-move action by the robot will be beneficial to disperse the objects in the workspace and thus assist the grasping and vision algorithm. We propose a disperse and pick method for domain-independent robotic grasping in a highly cluttered heap of objects. The novel contribution of our framework is the introduction of a heuristic clutter removal method that does not require deep learning and can work on unseen objects. At each iteration of the algorithm, the robot either performs a push-to-move action or a grasp action based on the estimated clutter profile. For grasp planning, we present an improved and adaptive version of a recent domain-independent grasping method. The efficacy of the integrated system is demonstrated in simulation as well as in the real-world. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: Published at 2022 International Joint Conference on Neural Networks (IJCNN)

Journal ref: 10.1109/IJCNN55064.2022.9892672

arXiv:2308.07081 [pdf, other]

Aesthetics of Sanskrit Poetry from the Perspective of Computational Linguistics: A Case Study Analysis on Siksastaka

Authors: Jivnesh Sandhan, Amruta Barbadikar, Malay Maity, Pavankumar Satuluri, Tushar Sandhan, Ravi M. Gupta, Pawan Goyal, Laxmidhar Behera

Abstract: Sanskrit poetry has played a significant role in shaping the literary and cultural landscape of the Indian subcontinent for centuries. However, not much attention has been devoted to uncovering the hidden beauty of Sanskrit poetry in computational linguistics. This article explores the intersection of Sanskrit poetry and computational linguistics by proposing a roadmap of an interpretable framewor… ▽ More Sanskrit poetry has played a significant role in shaping the literary and cultural landscape of the Indian subcontinent for centuries. However, not much attention has been devoted to uncovering the hidden beauty of Sanskrit poetry in computational linguistics. This article explores the intersection of Sanskrit poetry and computational linguistics by proposing a roadmap of an interpretable framework to analyze and classify the qualities and characteristics of fine Sanskrit poetry. We discuss the rich tradition of Sanskrit poetry and the significance of computational linguistics in automatically identifying the characteristics of fine poetry. The proposed framework involves a human-in-the-loop approach that combines deterministic aspects delegated to machines and deep semantics left to human experts. We provide a deep analysis of Siksastaka, a Sanskrit poem, from the perspective of 6 prominent kavyashastra schools, to illustrate the proposed framework. Additionally, we provide compound, dependency, anvaya (prose order linearised form), meter, rasa (mood), alankar (figure of speech), and riti (writing style) annotations for Siksastaka and a web application to illustrate the poem's analysis and annotations. Our key contributions include the proposed framework, the analysis of Siksastaka, the annotations and the web application for future research. Link for interactive analysis: https://sanskritshala.github.io/shikshastakam/ △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: 15 pages

arXiv:2302.09527 [pdf, other]

SanskritShala: A Neural Sanskrit NLP Toolkit with Web-Based Interface for Pedagogical and Annotation Purposes

Authors: Jivnesh Sandhan, Anshul Agarwal, Laxmidhar Behera, Tushar Sandhan, Pawan Goyal

Abstract: We present a neural Sanskrit Natural Language Processing (NLP) toolkit named SanskritShala (a school of Sanskrit) to facilitate computational linguistic analyses for several tasks such as word segmentation, morphological tagging, dependency parsing, and compound type identification. Our systems currently report state-of-the-art performance on available benchmark datasets for all tasks. SanskritSha… ▽ More We present a neural Sanskrit Natural Language Processing (NLP) toolkit named SanskritShala (a school of Sanskrit) to facilitate computational linguistic analyses for several tasks such as word segmentation, morphological tagging, dependency parsing, and compound type identification. Our systems currently report state-of-the-art performance on available benchmark datasets for all tasks. SanskritShala is deployed as a web-based application, which allows a user to get real-time analysis for the given input. It is built with easy-to-use interactive data annotation features that allow annotators to correct the system predictions when it makes mistakes. We publicly release the source codes of the 4 modules included in the toolkit, 7 word embedding models that have been trained on publicly available Sanskrit corpora and multiple annotated datasets such as word similarity, relatedness, categorization, analogy prediction to assess intrinsic properties of word embeddings. So far as we know, this is the first neural-based Sanskrit NLP toolkit that has a web-based interface and a number of NLP modules. We are sure that the people who are willing to work with Sanskrit will find it useful for pedagogical and annotative purposes. SanskritShala is available at: https://cnerg.iitkgp.ac.in/sanskritshala. The demo video of our platform can be accessed at: https://youtu.be/x0X31Y9k0mw4. △ Less

Submitted 29 May, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

Comments: 7 pages, Accepted at ACL23 (Demo track) to be held at Toronto, Canada

arXiv:2210.11753 [pdf, other]

TransLIST: A Transformer-Based Linguistically Informed Sanskrit Tokenizer

Authors: Jivnesh Sandhan, Rathin Singha, Narein Rao, Suvendu Samanta, Laxmidhar Behera, Pawan Goyal

Abstract: Sanskrit Word Segmentation (SWS) is essential in making digitized texts available and in deploying downstream tasks. It is, however, non-trivial because of the sandhi phenomenon that modifies the characters at the word boundaries, and needs special treatment. Existing lexicon driven approaches for SWS make use of Sanskrit Heritage Reader, a lexicon-driven shallow parser, to generate the complete c… ▽ More Sanskrit Word Segmentation (SWS) is essential in making digitized texts available and in deploying downstream tasks. It is, however, non-trivial because of the sandhi phenomenon that modifies the characters at the word boundaries, and needs special treatment. Existing lexicon driven approaches for SWS make use of Sanskrit Heritage Reader, a lexicon-driven shallow parser, to generate the complete candidate solution space, over which various methods are applied to produce the most valid solution. However, these approaches fail while encountering out-of-vocabulary tokens. On the other hand, purely engineering methods for SWS have made use of recent advances in deep learning, but cannot make use of the latent word information on availability. To mitigate the shortcomings of both families of approaches, we propose Transformer based Linguistically Informed Sanskrit Tokenizer (TransLIST) consisting of (1) a module that encodes the character input along with latent-word information, which takes into account the sandhi phenomenon specific to SWS and is apt to work with partial or no candidate solutions, (2) a novel soft-masked attention to prioritize potential candidate words and (3) a novel path ranking algorithm to rectify the corrupted predictions. Experiments on the benchmark datasets for SWS show that TransLIST outperforms the current state-of-the-art system by an average 7.2 points absolute gain in terms of perfect match (PM) metric. The codebase and datasets are publicly available at https://github.com/rsingha108/TransLIST △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: Accepted at EMNLP22 (Findings)

arXiv:2208.10310 [pdf, other]

A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit

Authors: Jivnesh Sandhan, Ashish Gupta, Hrishikesh Terdalkar, Tushar Sandhan, Suvendu Samanta, Laxmidhar Behera, Pawan Goyal

Abstract: The phenomenon of compounding is ubiquitous in Sanskrit. It serves for achieving brevity in expressing thoughts, while simultaneously enriching the lexical and structural formation of the language. In this work, we focus on the Sanskrit Compound Type Identification (SaCTI) task, where we consider the problem of identifying semantic relations between the components of a compound word. Earlier appro… ▽ More The phenomenon of compounding is ubiquitous in Sanskrit. It serves for achieving brevity in expressing thoughts, while simultaneously enriching the lexical and structural formation of the language. In this work, we focus on the Sanskrit Compound Type Identification (SaCTI) task, where we consider the problem of identifying semantic relations between the components of a compound word. Earlier approaches solely rely on the lexical information obtained from the components and ignore the most crucial contextual and syntactic information useful for SaCTI. However, the SaCTI task is challenging primarily due to the implicitly encoded context-sensitive semantic relation between the compound components. Thus, we propose a novel multi-task learning architecture which incorporates the contextual information and enriches the complementary syntactic information using morphological tagging and dependency parsing as two auxiliary tasks. Experiments on the benchmark datasets for SaCTI show 6.1 points (Accuracy) and 7.7 points (F1-score) absolute gain compared to the state-of-the-art system. Further, our multi-lingual experiments demonstrate the efficacy of the proposed architecture in English and Marathi languages.The code and datasets are publicly available at https://github.com/ashishgupta2598/SaCTI △ Less

Submitted 11 September, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

Comments: The work is accepted at COLING22, Gyeongju, Republic of Korea

arXiv:2201.11391 [pdf, other]

Prabhupadavani: A Code-mixed Speech Translation Data for 25 Languages

Authors: Jivnesh Sandhan, Ayush Daksh, Om Adideva Paranjay, Laxmidhar Behera, Pawan Goyal

Abstract: Nowadays, the interest in code-mixing has become ubiquitous in Natural Language Processing (NLP); however, not much attention has been given to address this phenomenon for Speech Translation (ST) task. This can be solely attributed to the lack of code-mixed ST task labelled data. Thus, we introduce Prabhupadavani, which is a multilingual code-mixed ST dataset for 25 languages. It is multi-domain,… ▽ More Nowadays, the interest in code-mixing has become ubiquitous in Natural Language Processing (NLP); however, not much attention has been given to address this phenomenon for Speech Translation (ST) task. This can be solely attributed to the lack of code-mixed ST task labelled data. Thus, we introduce Prabhupadavani, which is a multilingual code-mixed ST dataset for 25 languages. It is multi-domain, covers ten language families, containing 94 hours of speech by 130+ speakers, manually aligned with corresponding text in the target language. The Prabhupadavani is about Vedic culture and heritage from Indic literature, where code-switching in the case of quotation from literature is important in the context of humanities teaching. To the best of our knowledge, Prabhupadvani is the first multi-lingual code-mixed ST dataset available in the ST literature. This data also can be used for a code-mixed machine translation task. All the dataset can be accessed at https://github.com/frozentoad9/CMST. △ Less

Submitted 4 September, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: The work is accepted at COLING22-SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

arXiv:2201.11374 [pdf, other]

Systematic Investigation of Strategies Tailored for Low-Resource Settings for Low-Resource Dependency Parsing

Authors: Jivnesh Sandhan, Laxmidhar Behera, Pawan Goyal

Abstract: In this work, we focus on low-resource dependency parsing for multiple languages. Several strategies are tailored to enhance performance in low-resource scenarios. While these are well-known to the community, it is not trivial to select the best-performing combination of these strategies for a low-resource language that we are interested in, and not much attention has been given to measuring the e… ▽ More In this work, we focus on low-resource dependency parsing for multiple languages. Several strategies are tailored to enhance performance in low-resource scenarios. While these are well-known to the community, it is not trivial to select the best-performing combination of these strategies for a low-resource language that we are interested in, and not much attention has been given to measuring the efficacy of these strategies. We experiment with 5 low-resource strategies for our ensembled approach on 7 Universal Dependency (UD) low-resource languages. Our exhaustive experimentation on these languages supports the effective improvements for languages not covered in pretrained models. We show a successful application of the ensembled system on a truly low-resource language Sanskrit. The code and data are available at: https://github.com/Jivnesh/SanDP △ Less

Submitted 29 January, 2023; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: Accepted at EACL2023 to be held in Croatia Europe

arXiv:2112.08024 [pdf, other]

Visually Guided UGV for Autonomous Mobile Manipulation in Dynamic and Unstructured GPS Denied Environments

Authors: Mohit Vohra, Laxmidhar Behera

Abstract: A robotic solution for the unmanned ground vehicles (UGVs) to execute the highly complex task of object manipulation in an autonomous mode is presented. This paper primarily focuses on developing an autonomous robotic system capable of assembling elementary blocks to build the large 3D structures in GPS-denied environments. The key contributions of this system paper are i) Designing of a deep lear… ▽ More A robotic solution for the unmanned ground vehicles (UGVs) to execute the highly complex task of object manipulation in an autonomous mode is presented. This paper primarily focuses on developing an autonomous robotic system capable of assembling elementary blocks to build the large 3D structures in GPS-denied environments. The key contributions of this system paper are i) Designing of a deep learning-based unified multi-task visual perception system for object detection, part-detection, instance segmentation, and tracking, ii) an electromagnetic gripper design for robust grasping, and iii) system integration in which multiple system components are integrated to develop an optimized software stack. The entire mechatronic and algorithmic design of UGV for the above application is detailed in this work. The performance and efficacy of the overall system are reported through several rigorous experiments. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: Paper has been accepted for publication in International Conference On Computational Intelligence - ICCI 2021

arXiv:2107.12701 [pdf, other]

End-To-End Real-Time Visual Perception Framework for Construction Automation

Authors: Mohit Vohra, Ashish Kumar, Ravi Prakash, Laxmidhar Behera

Abstract: In this work, we present a robotic solution to automate the task of wall construction. To that end, we present an end-to-end visual perception framework that can quickly detect and localize bricks in a clutter. Further, we present a light computational method of brick pose estimation that incorporates the above information. The proposed detection network predicts a rotated box compared to YOLO and… ▽ More In this work, we present a robotic solution to automate the task of wall construction. To that end, we present an end-to-end visual perception framework that can quickly detect and localize bricks in a clutter. Further, we present a light computational method of brick pose estimation that incorporates the above information. The proposed detection network predicts a rotated box compared to YOLO and SSD, thereby maximizing the object's region in the predicted box regions. In addition, precision P, recall R, and mean-average-precision (mAP) scores are reported to evaluate the proposed framework. We observed that for our task, the proposed scheme outperforms the upright bounding box detectors. Further, we deploy the proposed visual perception framework on a robotic system endowed with a UR5 robot manipulator and demonstrate that the system can successfully replicate a simplified version of the wall-building task in an autonomous mode. △ Less

Submitted 27 July, 2021; originally announced July 2021.

Comments: The paper has been accepted as a regular paper in IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2021

arXiv:2104.09099 [pdf, other]

Edge and Corner Detection in Unorganized Point Clouds for Robotic Pick and Place Applications

Authors: Mohit Vohra, Ravi Prakash, Laxmidhar Behera

Abstract: In this paper, we propose a novel edge and corner detection algorithm for an unorganized point cloud. Our edge detection method classifies a query point as an edge point by evaluating the distribution of local neighboring points around the query point. The proposed technique has been tested on generic items such as dragons, bunnies, and coffee cups from the Stanford 3D scanning repository. The pro… ▽ More In this paper, we propose a novel edge and corner detection algorithm for an unorganized point cloud. Our edge detection method classifies a query point as an edge point by evaluating the distribution of local neighboring points around the query point. The proposed technique has been tested on generic items such as dragons, bunnies, and coffee cups from the Stanford 3D scanning repository. The proposed technique can be directly applied to real and unprocessed point cloud data of random clutter of objects. To demonstrate the proposed technique's efficacy, we compare it to the other solutions for 3D edge extractions in an unorganized point cloud data. We observed that the proposed method could handle the raw and noisy data with little variations in parameters compared to other methods. We also extend the algorithm to estimate the 6D pose of known objects in the presence of dense clutter while handling multiple instances of the object. The overall approach is tested for a warehouse application, where an actual UR5 robot manipulator is used for robotic pick and place operations in an autonomous mode. △ Less

Submitted 21 April, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

Comments: Paper has been accepted for presentation in International Conference on Informatics in Control, Automation and Robotics (ICINCO, 2021)

arXiv:2104.00270 [pdf, other]

Evaluating Neural Word Embeddings for Sanskrit

Authors: Jivnesh Sandhan, Om Adideva, Digumarthi Komal, Laxmidhar Behera, Pawan Goyal

Abstract: Recently, the supervised learning paradigm's surprisingly remarkable performance has garnered considerable attention from Sanskrit Computational Linguists. As a result, the Sanskrit community has put laudable efforts to build task-specific labeled data for various downstream Natural Language Processing (NLP) tasks. The primary component of these approaches comes from representations of word embedd… ▽ More Recently, the supervised learning paradigm's surprisingly remarkable performance has garnered considerable attention from Sanskrit Computational Linguists. As a result, the Sanskrit community has put laudable efforts to build task-specific labeled data for various downstream Natural Language Processing (NLP) tasks. The primary component of these approaches comes from representations of word embeddings. Word embedding helps to transfer knowledge learned from readily available unlabelled data for improving task-specific performance in low-resource setting. Last decade, there has been much excitement in the field of digitization of Sanskrit. To effectively use such readily available resources, it is very much essential to perform a systematic study on word embedding approaches for the Sanskrit language. In this work, we investigate the effectiveness of word embeddings. We classify word embeddings in broad categories to facilitate systematic experimentation and evaluate them on four intrinsic tasks. We investigate the efficacy of embeddings approaches (originally proposed for languages other than Sanskrit) for Sanskrit along with various challenges posed by language. △ Less

Submitted 1 April, 2021; originally announced April 2021.

Comments: 14 pages, The work is submitted at WSC 2022, Canberra, Australia

arXiv:2102.06551 [pdf, other]

A Little Pretraining Goes a Long Way: A Case Study on Dependency Parsing Task for Low-resource Morphologically Rich Languages

Authors: Jivnesh Sandhan, Amrith Krishna, Ashim Gupta, Laxmidhar Behera, Pawan Goyal

Abstract: Neural dependency parsing has achieved remarkable performance for many domains and languages. The bottleneck of massive labeled data limits the effectiveness of these approaches for low resource languages. In this work, we focus on dependency parsing for morphological rich languages (MRLs) in a low-resource setting. Although morphological information is essential for the dependency parsing task, t… ▽ More Neural dependency parsing has achieved remarkable performance for many domains and languages. The bottleneck of massive labeled data limits the effectiveness of these approaches for low resource languages. In this work, we focus on dependency parsing for morphological rich languages (MRLs) in a low-resource setting. Although morphological information is essential for the dependency parsing task, the morphological disambiguation and lack of powerful analyzers pose challenges to get this information for MRLs. To address these challenges, we propose simple auxiliary tasks for pretraining. We perform experiments on 10 MRLs in low-resource settings to measure the efficacy of our proposed pretraining method and observe an average absolute gain of 2 points (UAS) and 3.6 points (LAS). Code and data available at: https://github.com/jivnesh/LCM △ Less

Submitted 12 April, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

Comments: 6 pages, The work is accepted at EACL-SRW, 2021, Kyiv, Ukraine Typos corrected in Section 3.2

arXiv:2101.06414 [pdf, other]

Towards Deep Learning Assisted Autonomous UAVs for Manipulation Tasks in GPS-Denied Environments

Authors: Ashish Kumar, Mohit Vohra, Ravi Prakash, L. Behera

Abstract: In this work, we present a pragmatic approach to enable unmanned aerial vehicle (UAVs) to autonomously perform highly complicated tasks of object pick and place. This paper is largely inspired by challenge-2 of MBZIRC 2020 and is primarily focused on the task of assembling large 3D structures in outdoors and GPS-denied environments. Primary contributions of this system are: (i) a novel computation… ▽ More In this work, we present a pragmatic approach to enable unmanned aerial vehicle (UAVs) to autonomously perform highly complicated tasks of object pick and place. This paper is largely inspired by challenge-2 of MBZIRC 2020 and is primarily focused on the task of assembling large 3D structures in outdoors and GPS-denied environments. Primary contributions of this system are: (i) a novel computationally efficient deep learning based unified multi-task visual perception system for target localization, part segmentation, and tracking, (ii) a novel deep learning based grasp state estimation, (iii) a retracting electromagnetic gripper design, (iv) a remote computing approach which exploits state-of-the-art MIMO based high speed (5000Mb/s) wireless links to allow the UAVs to execute compute intensive tasks on remote high end compute servers, and (v) system integration in which several system components are weaved together in order to develop an optimized software stack. We use DJI Matrice-600 Pro, a hex-rotor UAV and interface it with the custom designed gripper. Our framework is deployed on the specified UAV in order to report the performance analysis of the individual modules. Apart from the manipulation system, we also highlight several hidden challenges associated with the UAVs in this context. △ Less

Submitted 16 January, 2021; originally announced January 2021.

Comments: 8 pages, 5 figures, 5 tables, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020

arXiv:2101.06411 [pdf, other]

DeepMI: A Mutual Information Based Framework For Unsupervised Deep Learning of Tasks

Authors: Ashish Kumar, Laxmidhar Behera

Abstract: In this work, we propose an information theory based framework DeepMI to train deep neural networks (DNN) using Mutual Information (MI). The DeepMI framework is especially targeted but not limited to the learning of real world tasks in an unsupervised manner. The primary motivation behind this work is the limitation of the traditional loss functions for unsupervised learning of a given task. Direc… ▽ More In this work, we propose an information theory based framework DeepMI to train deep neural networks (DNN) using Mutual Information (MI). The DeepMI framework is especially targeted but not limited to the learning of real world tasks in an unsupervised manner. The primary motivation behind this work is the limitation of the traditional loss functions for unsupervised learning of a given task. Directly using MI for the training purpose is quite challenging to deal with because of its unbounded above nature. Hence, we develop an alternative linearized representation of MI as a part of the framework. Contributions of this paper are three fold: i) investigation of MI to train deep neural networks, ii) novel loss function LLMI , and iii) a fuzzy logic based end-to-end differentiable pipeline to integrate DeepMI into deep learning framework. Due to the unavailability of a standard benchmark, we carefully design the experimental analysis and select three different tasks for the experimental study. We demonstrate that L LMI alone provides better gradients to achieve a neural network better performance over the popular loss functions, also in the cases when multiple loss functions are used for a given task. △ Less

Submitted 4 March, 2022; v1 submitted 16 January, 2021; originally announced January 2021.

Comments: 10 pages, 1 figure, 2 tables

arXiv:2101.06409 [pdf, other]

Shape Back-Projection In 3D Scenes

Authors: Ashish Kumar, L. Behera

Abstract: In this work, we propose a novel framework shape back-projection for computationally efficient point cloud processing in a probabilistic manner. The primary component of the technique is shape histogram and a back-projection procedure. The technique measures similarity between 3D surfaces, by analyzing their geometrical properties. It is analogous to color back-projection which measures similarity… ▽ More In this work, we propose a novel framework shape back-projection for computationally efficient point cloud processing in a probabilistic manner. The primary component of the technique is shape histogram and a back-projection procedure. The technique measures similarity between 3D surfaces, by analyzing their geometrical properties. It is analogous to color back-projection which measures similarity between images, simply by looking at their color distributions. In the overall process, first, shape histogram of a sample surface (e.g. planar) is computed, which captures the profile of surface normals around a point in form of a probability distribution. Later, the histogram is back-projected onto a test surface and a likelihood score is obtained. The score depicts that how likely a point in the test surface behaves similar to the sample surface, geometrically. Shape back-projection finds its application in binary surface classification, high curvature edge detection in unorganized point cloud, automated point cloud labeling for 3D-CNNs (convolutional neural network) etc. The algorithm can also be used for real-time robotic operations such as autonomous object picking in warehouse automation, ground plane extraction for autonomous vehicles and can be deployed easily on computationally limited platforms (UAVs). △ Less

Submitted 16 January, 2021; originally announced January 2021.

Comments: 7 pages, 7 figures, 3 tables

arXiv:2101.06405 [pdf, other]

Semi Supervised Deep Quick Instance Detection and Segmentation

Authors: Ashish Kumar, L. Behera

Abstract: In this paper, we present a semi supervised deep quick learning framework for instance detection and pixel-wise semantic segmentation of images in a dense clutter of items. The framework can quickly and incrementally learn novel items in an online manner by real-time data acquisition and generating corresponding ground truths on its own. To learn various combinations of items, it can synthesize cl… ▽ More In this paper, we present a semi supervised deep quick learning framework for instance detection and pixel-wise semantic segmentation of images in a dense clutter of items. The framework can quickly and incrementally learn novel items in an online manner by real-time data acquisition and generating corresponding ground truths on its own. To learn various combinations of items, it can synthesize cluttered scenes, in real time. The overall approach is based on the tutor-child analogy in which a deep network (tutor) is pretrained for class-agnostic object detection which generates labeled data for another deep network (child). The child utilizes a customized convolutional neural network head for the purpose of quick learning. There are broadly four key components of the proposed framework semi supervised labeling, occlusion aware clutter synthesis, a customized convolutional neural network head, and instance detection. The initial version of this framework was implemented during our participation in Amazon Robotics Challenge (ARC), 2017. Our system was ranked 3rd, 4th and 5th worldwide in pick, stow-pick and stow task respectively. The proposed framework is an improved version over ARC17 where novel features such as instance detection and online learning has been added. △ Less

Submitted 16 January, 2021; originally announced January 2021.

Comments: 7 Pages, 7 Figures, 5 Tables. 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019

arXiv:2010.05155 [pdf, other]

A Method for Handling Multi-class Imbalanced Data by Geometry based Information Sampling and Class Prioritized Synthetic Data Generation (GICaPS)

Authors: Anima Majumder, Samrat Dutta, Swagat Kumar, Laxmidhar Behera

Abstract: This paper looks into the problem of handling imbalanced data in a multi-label classification problem. The problem is solved by proposing two novel methods that primarily exploit the geometric relationship between the feature vectors. The first one is an undersampling algorithm that uses angle between feature vectors to select more informative samples while rejecting the less informative ones. A s… ▽ More This paper looks into the problem of handling imbalanced data in a multi-label classification problem. The problem is solved by proposing two novel methods that primarily exploit the geometric relationship between the feature vectors. The first one is an undersampling algorithm that uses angle between feature vectors to select more informative samples while rejecting the less informative ones. A suitable criterion is proposed to define the informativeness of a given sample. The second one is an oversampling algorithm that uses a generative algorithm to create new synthetic data that respects all class boundaries. This is achieved by finding \emph{no man's land} based on Euclidean distance between the feature vectors. The efficacy of the proposed methods is analyzed by solving a generic multi-class recognition problem based on mixture of Gaussians. The superiority of the proposed algorithms is established through comparison with other state-of-the-art methods, including SMOTE and ADASYN, over ten different publicly available datasets exhibiting high-to-extreme data imbalance. These two methods are combined into a single data processing framework and is labeled as ``GICaPS'' to highlight the role of geometry-based information (GI) sampling and Class-Prioritized Synthesis (CaPS) in dealing with multi-class data imbalance problem, thereby making a novel contribution in this field. △ Less

Submitted 11 October, 2020; originally announced October 2020.

arXiv:2007.04645 [pdf, other]

doi 10.1109/IROS45743.2020.9341756

Learning to Switch CNNs with Model Agnostic Meta Learning for Fine Precision Visual Servoing

Authors: Prem Raj, Vinay P. Namboodiri, L. Behera

Abstract: Convolutional Neural Networks (CNNs) have been successfully applied for relative camera pose estimation from labeled image-pair data, without requiring any hand-engineered features, camera intrinsic parameters or depth information. The trained CNN can be utilized for performing pose based visual servo control (PBVS). One of the ways to improve the quality of visual servo output is to improve the a… ▽ More Convolutional Neural Networks (CNNs) have been successfully applied for relative camera pose estimation from labeled image-pair data, without requiring any hand-engineered features, camera intrinsic parameters or depth information. The trained CNN can be utilized for performing pose based visual servo control (PBVS). One of the ways to improve the quality of visual servo output is to improve the accuracy of the CNN for estimating the relative pose estimation. With a given state-of-the-art CNN for relative pose regression, how can we achieve an improved performance for visual servo control? In this paper, we explore switching of CNNs to improve the precision of visual servo control. The idea of switching a CNN is due to the fact that the dataset for training a relative camera pose regressor for visual servo control must contain variations in relative pose ranging from a very small scale to eventually a larger scale. We found that, training two different instances of the CNN, one for large-scale-displacements (LSD) and another for small-scale-displacements (SSD) and switching them during the visual servo execution yields better results than training a single CNN with the combined LSD+SSD data. However, it causes extra storage overhead and switching decision is taken by a manually set threshold which may not be optimal for all the scenes. To eliminate these drawbacks, we propose an efficient switching strategy based on model agnostic meta learning (MAML) algorithm. In this, a single model is trained to learn parameters which are simultaneously good for multiple tasks, namely a binary classification for switching decision, a 6DOF pose regression for LSD data and also a 6DOF pose regression for SSD data. The proposed approach performs far better than the naive approach, while storage and run-time overheads are almost negligible. △ Less

Submitted 9 July, 2020; originally announced July 2020.

Comments: Accepted in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2020). For video visit - https://youtu.be/GSG20lmWDUo

Journal ref: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 10210-10217

arXiv:2001.05856 [pdf, other]

Domain Independent Unsupervised Learning to grasp the Novel Objects

Authors: Siddhartha Vibhu Pharswan, Mohit Vohra, Ashish Kumar, Laxmidhar Behera

Abstract: One of the main challenges in the vision-based grasping is the selection of feasible grasp regions while interacting with novel objects. Recent approaches exploit the power of the convolutional neural network (CNN) to achieve accurate grasping at the cost of high computational power and time. In this paper, we present a novel unsupervised learning based algorithm for the selection of feasible gras… ▽ More One of the main challenges in the vision-based grasping is the selection of feasible grasp regions while interacting with novel objects. Recent approaches exploit the power of the convolutional neural network (CNN) to achieve accurate grasping at the cost of high computational power and time. In this paper, we present a novel unsupervised learning based algorithm for the selection of feasible grasp regions. Unsupervised learning infers the pattern in data-set without any external labels. We apply k-means clustering on the image plane to identify the grasp regions, followed by an axis assignment method. We define a novel concept of Grasp Decide Index (GDI) to select the best grasp pose in image plane. We have conducted several experiments in clutter or isolated environment on standard objects of Amazon Robotics Challenge 2017 and Amazon Picking Challenge 2016. We compare the results with prior learning based approaches to validate the robustness and adaptive nature of our algorithm for a variety of novel objects in different domains. △ Less

Submitted 9 January, 2020; originally announced January 2020.

Comments: Paper has been accepted for publication in IROS2019

arXiv:2001.02076 [pdf, other]

Real-time Grasp Pose Estimation for Novel Objects in Densely Cluttered Environment

Authors: Mohit Vohra, Ravi Prakash, Laxmidhar Behera

Abstract: Grasping of novel objects in pick and place applications is a fundamental and challenging problem in robotics, specifically for complex-shaped objects. It is observed that the well-known strategies like \textit{i}) grasping from the centroid of object and \textit{ii}) grasping along the major axis of the object often fails for complex-shaped objects. In this paper, a real-time grasp pose estimatio… ▽ More Grasping of novel objects in pick and place applications is a fundamental and challenging problem in robotics, specifically for complex-shaped objects. It is observed that the well-known strategies like \textit{i}) grasping from the centroid of object and \textit{ii}) grasping along the major axis of the object often fails for complex-shaped objects. In this paper, a real-time grasp pose estimation strategy for novel objects in robotic pick and place applications is proposed. The proposed technique estimates the object contour in the point cloud and predicts the grasp pose along with the object skeleton in the image plane. The technique is tested for the objects like ball container, hand weight, tennis ball and even for complex shape objects like blower (non-convex shape). It is observed that the proposed strategy performs very well for complex shaped objects and predicts the valid grasp configurations in comparison with the above strategies. The experimental validation of the proposed grasping technique is tested in two scenarios, when the objects are placed distinctly and when the objects are placed in dense clutter. A grasp accuracy of 88.16\% and 77.03\% respectively are reported. All the experiments are performed with a real UR10 robot manipulator along with WSG-50 two-finger gripper for grasping of objects. △ Less

Submitted 3 January, 2020; originally announced January 2020.

Comments: Paper has been accepted for publication in IEEE RoMan-2019

arXiv:1901.07457 [pdf, ps, other]

Divergence Framework for EEG based Multiclass Motor Imagery Brain Computer Interface

Authors: Satyam Kumar, Tharun Kumar Reddy, Laxmidhar Behera

Abstract: Similar to most of the real world data, the ubiquitous presence of non-stationarities in the EEG signals significantly perturb the feature distribution thus deteriorating the performance of Brain Computer Interface. In this letter, a novel method is proposed based on Joint Approximate Diagonalization (JAD) to optimize stationarity for multiclass motor imagery Brain Computer Interface (BCI) in an i… ▽ More Similar to most of the real world data, the ubiquitous presence of non-stationarities in the EEG signals significantly perturb the feature distribution thus deteriorating the performance of Brain Computer Interface. In this letter, a novel method is proposed based on Joint Approximate Diagonalization (JAD) to optimize stationarity for multiclass motor imagery Brain Computer Interface (BCI) in an information theoretic framework. Specifically, in the proposed method, we estimate the subspace which optimizes the discriminability between the classes and simultaneously preserve stationarity within the motor imagery classes. We determine the subspace for the proposed approach through optimization using gradient descent on an orthogonal manifold. The performance of the proposed stationarity enforcing algorithm is compared to that of baseline One-Versus-Rest (OVR)-CSP and JAD on publicly available BCI competition IV dataset IIa. Results show that an improvement in average classification accuracies across the subjects over the baseline algorithms and thus essence of alleviating within session non-stationarities. △ Less

Submitted 12 January, 2019; originally announced January 2019.

arXiv:1703.02340 [pdf, ps, other]

Design and Development of an automated Robotic Pick & Stow System for an e-Commerce Warehouse

Authors: Swagat Kumar, Anima Majumder, Samrat Dutta, Rekha Raja, Sharath Jotawar, Ashish Kumar, Manish Soni, Venkat Raju, Olyvia Kundu, Ehtesham Hassan Laxmidhar Behera, K. S. Venkatesh, Rajesh Sinha

Abstract: In this paper, we provide details of a robotic system that can automate the task of picking and stowing objects from and to a rack in an e-commerce fulfillment warehouse. The system primarily comprises of four main modules: (1) Perception module responsible for recognizing query objects and localizing them in the 3-dimensional robot workspace; (2) Planning module generates necessary paths that the… ▽ More In this paper, we provide details of a robotic system that can automate the task of picking and stowing objects from and to a rack in an e-commerce fulfillment warehouse. The system primarily comprises of four main modules: (1) Perception module responsible for recognizing query objects and localizing them in the 3-dimensional robot workspace; (2) Planning module generates necessary paths that the robot end- effector has to take for reaching the objects in the rack or in the tote; (3) Calibration module that defines the physical workspace for the robot visible through the on-board vision system; and (4) Gripping and suction system for picking and stowing different kinds of objects. The perception module uses a faster region-based Convolutional Neural Network (R-CNN) to recognize objects. We designed a novel two finger gripper that incorporates pneumatic valve based suction effect to enhance its ability to pick different kinds of objects. The system was developed by IITK-TCS team for participation in the Amazon Picking Challenge 2016 event. The team secured a fifth place in the stowing task in the event. The purpose of this article is to share our experiences with students and practicing engineers and enable them to build similar systems. The overall efficacy of the system is demonstrated through several simulation as well as real-world experiments with actual robots. △ Less

Submitted 7 March, 2017; originally announced March 2017.

Comments: 15 Pages, 25 Figures, 4 Tables, Journal Paper

arXiv:1604.07778 [pdf, other]

doi 10.3390/e17117658

Neighborhood approximations for non-linear voter models

Authors: Frank Schweitzer, Laxmidhar Behera

Abstract: Non-linear voter models assume that the opinion of an agent depends on the opinions of its neighbors in a non-linear manner. This allows for voting rules different from majority voting. While the linear voter model is known to reach consensus, non-linear voter models can result in the coexistence of opposite opinions. Our aim is to derive approximations to correctly predict the time dependent dyna… ▽ More Non-linear voter models assume that the opinion of an agent depends on the opinions of its neighbors in a non-linear manner. This allows for voting rules different from majority voting. While the linear voter model is known to reach consensus, non-linear voter models can result in the coexistence of opposite opinions. Our aim is to derive approximations to correctly predict the time dependent dynamics, or at least the asymptotic outcome, of such local interactions. Emphasis is on a probabilistic approach to decompose the opinion distribution in a second-order neighborhood into lower-order probability distributions. This is compared with an analytic pair approximation for the expected value of the global fraction of opinions and a mean-field approximation. Our reference case are averaged stochastic simulations of a one-dimensional cellular automaton. We find that the probabilistic second-order approach captures the dynamics of the reference case very well for different non-linearities, i.e for both majority and minority voting rules, which only partly holds for the first-order pair approximation and not at all for the mean-field approximation. We further discuss the interesting phenomenon of a correlated coexistence, characterized by the formation of large domains of opinions that dominate for some time, but slowly change. △ Less

Submitted 15 December, 2015; originally announced April 2016.

Journal ref: Entropy, vol. 15, pp. 7658-7679 (2015)

arXiv:1504.07278 [pdf, ps, other]

Optimal Convergence Rate in Feed Forward Neural Networks using HJB Equation

Authors: Vipul Arora, Laxmidhar Behera, Ajay Pratap Yadav

Abstract: A control theoretic approach is presented in this paper for both batch and instantaneous updates of weights in feed-forward neural networks. The popular Hamilton-Jacobi-Bellman (HJB) equation has been used to generate an optimal weight update law. The remarkable contribution in this paper is that closed form solutions for both optimal cost and weight update can be achieved for any feed-forward net… ▽ More A control theoretic approach is presented in this paper for both batch and instantaneous updates of weights in feed-forward neural networks. The popular Hamilton-Jacobi-Bellman (HJB) equation has been used to generate an optimal weight update law. The remarkable contribution in this paper is that closed form solutions for both optimal cost and weight update can be achieved for any feed-forward network using HJB equation in a simple yet elegant manner. The proposed approach has been compared with some of the existing best performing learning algorithms. It is found as expected that the proposed approach is faster in convergence in terms of computational time. Some of the benchmark test data such as 8-bit parity, breast cancer and credit approval, as well as 2D Gabor function have been used to validate our claims. The paper also discusses issues related to global optimization. The limitations of popular deterministic weight update laws are critiqued and the possibility of global optimization using HJB formulation is discussed. It is hoped that the proposed algorithm will bring in a lot of interest in researchers working in developing fast learning algorithms and global optimization. △ Less

Submitted 27 April, 2015; originally announced April 2015.

Comments: 9 pages, journal

arXiv:1202.1201 [pdf, ps, other]

doi 10.1142/S0219525912500592

Optimal migration promotes the outbreak of cooperation in heterogeneous populations

Authors: Frank Schweitzer, Laxmidhar Behera

Abstract: We consider a population of agents that are heterogeneous with respect to (i) their strategy when interacting $n_{g}$ times with other agents in an iterated prisoners dilemma game, (ii) their spatial location on $K$ different islands. After each generation, agents adopt strategies proportional to their average payoff received. Assuming a mix of two cooperating and two defecting strategies, we firs… ▽ More We consider a population of agents that are heterogeneous with respect to (i) their strategy when interacting $n_{g}$ times with other agents in an iterated prisoners dilemma game, (ii) their spatial location on $K$ different islands. After each generation, agents adopt strategies proportional to their average payoff received. Assuming a mix of two cooperating and two defecting strategies, we first investigate for isolated islands the conditions for an exclusive domination of each of these strategies and their possible coexistence. This allows to define a threshold frequency for cooperation that, dependent on $n_{g}$ and the initial mix of strategies, describes the outbreak of cooperation in the absense of migration. We then allow migration of a fixed fraction of the population after each generation. Assuming a worst case scenario where all islands are occupied by defecting strategies, whereas only one island is occupied by cooperators at the threshold frequency, we determine the optimal migration rate that allows the outbreak of cooperation on \emph{all} islands. We further find that the threshold frequency divided by the number of islands, i.e. the relative effort for invading defecting islands with cooperators decreeses with the number of islands. We also show that there is only a small bandwidth of migration rates, to allow the outbreak of cooperation. Larger migration rates destroy cooperation. △ Less

Submitted 24 June, 2012; v1 submitted 6 February, 2012; originally announced February 2012.

Comments: 29 pp. Submitted to ACS - Advances in Complex Systems (2012)

Journal ref: ACS - Advances in Complex Systems, vol. 15, Suppl. No. 1 (2012) 1250059 (27 pages)

arXiv:cond-mat/0211605 [pdf, ps, other]

Evolution of Cooperation in a Spatial Prisoner's Dilemma

Authors: Frank Schweitzer, Laxmidhar Behera, Heinz Muehlenbein

Abstract: We investigate the spatial distribution and the global frequency of agents who can either cooperate or defect. The agent interaction is described by a deterministic, non-iterated prisoner's dilemma game, further each agent only locally interacts with his neighbors. Based on a detailed analysis of the local payoff structures we derive critical conditions for the invasion or the spatial coexistenc… ▽ More We investigate the spatial distribution and the global frequency of agents who can either cooperate or defect. The agent interaction is described by a deterministic, non-iterated prisoner's dilemma game, further each agent only locally interacts with his neighbors. Based on a detailed analysis of the local payoff structures we derive critical conditions for the invasion or the spatial coexistence of cooperators and defectors. These results are concluded in a phase diagram that allows to identify five regimes, each characterized by a distinct spatiotemporal dynamics and a corresponding final spatial structure. In addition to the complete invasion of defectors, we find coexistence regimes with either a majority of cooperators in large spatial domains, or a minority of cooperators organized in small non-stationary domains or in small clusters. The analysis further allowed a verification of computer simulation results by Nowak and May (1993). Eventually, we present simulation results of a true 5-person game on a lattice. This modification leads to non-uniform spatial interactions that may even enhance the effect of cooperation. Keywords: Prisoner's dilemma; cooperation; spatial 5-person game △ Less

Submitted 26 November, 2002; originally announced November 2002.

Comments: 33 pages, 22 multipart figures, for related papers see http://www.ais.fraunhofer.de/~frank/papers.html

Showing 1–29 of 29 results for author: Behera, L