Search | arXiv e-print repository

Translating Imaging to Genomics: Leveraging Transformers for Predictive Modeling

Authors: Aiman Farooq, Deepak Mishra, Santanu Chaudhury

Abstract: In this study, we present a novel approach for predicting genomic information from medical imaging modalities using a transformer-based model. We aim to bridge the gap between imaging and genomics data by leveraging transformer networks, allowing for accurate genomic profile predictions from CT/MRI images. Presently most studies rely on the use of whole slide images (WSI) for the association, whic… ▽ More In this study, we present a novel approach for predicting genomic information from medical imaging modalities using a transformer-based model. We aim to bridge the gap between imaging and genomics data by leveraging transformer networks, allowing for accurate genomic profile predictions from CT/MRI images. Presently most studies rely on the use of whole slide images (WSI) for the association, which are obtained via invasive methodologies. We propose using only available CT/MRI images to predict genomic sequences. Our transformer based approach is able to efficiently generate associations between multiple sequences based on CT/MRI images alone. This work paves the way for the use of non-invasive imaging modalities for precise and personalized healthcare, allowing for a better understanding of diseases and treatment. △ Less

Submitted 1 August, 2024; originally announced August 2024.

arXiv:2407.16908 [pdf, other]

Generation Constraint Scaling Can Mitigate Hallucination

Authors: Georgios Kollias, Payel Das, Subhajit Chaudhury

Abstract: Addressing the issue of hallucinations in large language models (LLMs) is a critical challenge. As the cognitive mechanisms of hallucination have been related to memory, here we explore hallucination for LLM that is enabled with explicit memory mechanisms. We empirically demonstrate that by simply scaling the readout vector that constrains generation in a memory-augmented LLM decoder, hallucinatio… ▽ More Addressing the issue of hallucinations in large language models (LLMs) is a critical challenge. As the cognitive mechanisms of hallucination have been related to memory, here we explore hallucination for LLM that is enabled with explicit memory mechanisms. We empirically demonstrate that by simply scaling the readout vector that constrains generation in a memory-augmented LLM decoder, hallucination mitigation can be achieved in a training-free manner. Our method is geometry-inspired and outperforms a state-of-the-art LLM editing method on the task of generation of Wikipedia-like biography entries both in terms of generation quality and runtime complexity. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: 7 pages; accepted at ICML 2024 Workshop on Large Language Models and Cognition

arXiv:2407.11172 [pdf]

Micro-Ring Modulator Linearity Enhancement for Analog and Digital Optical Links

Authors: Sumilak Chaudhury, Karl Johnson, Chengkuan Gao, Bill Lin, Yeshaiahu Fainman, Tzu-Chien Hsueh

Abstract: An energy/area-efficient low-cost broadband linearity enhancement technique for electro-optic micro-ring modulators (MRM) is proposed to achieve 6.1-dB dynamic linearity improvement in spurious-free-dynamic-range with intermodulation distortions (IMD) and 17.9-dB static linearity improvement in integral nonlinearity over a conventional notch-filter MRM within a 4.8-dB extinction-ratio (ER) full-sc… ▽ More An energy/area-efficient low-cost broadband linearity enhancement technique for electro-optic micro-ring modulators (MRM) is proposed to achieve 6.1-dB dynamic linearity improvement in spurious-free-dynamic-range with intermodulation distortions (IMD) and 17.9-dB static linearity improvement in integral nonlinearity over a conventional notch-filter MRM within a 4.8-dB extinction-ratio (ER) full-scale range based on rapid silicon-photonics fabrication results for the emerging applications of various analog and digital optical communication systems. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 4 pages, 5 figures

arXiv:2407.01619 [pdf, other]

TabSketchFM: Sketch-based Tabular Representation Learning for Data Discovery over Data Lakes

Authors: Aamod Khatiwada, Harsha Kokel, Ibrahim Abdelaziz, Subhajit Chaudhury, Julian Dolby, Oktie Hassanzadeh, Zhenhan Huang, Tejaswini Pedapati, Horst Samulowitz, Kavitha Srinivas

Abstract: Enterprises have a growing need to identify relevant tables in data lakes; e.g. tables that are unionable, joinable, or subsets of each other. Tabular neural models can be helpful for such data discovery tasks. In this paper, we present TabSketchFM, a neural tabular model for data discovery over data lakes. First, we propose a novel pre-training sketch-based approach to enhance the effectiveness o… ▽ More Enterprises have a growing need to identify relevant tables in data lakes; e.g. tables that are unionable, joinable, or subsets of each other. Tabular neural models can be helpful for such data discovery tasks. In this paper, we present TabSketchFM, a neural tabular model for data discovery over data lakes. First, we propose a novel pre-training sketch-based approach to enhance the effectiveness of data discovery techniques in neural tabular models. Second, to further finetune the pretrained model for several downstream tasks, we develop LakeBench, a collection of 8 benchmarks to help with different data discovery tasks such as finding tasks that are unionable, joinable, or subsets of each other. We then show on these finetuning tasks that TabSketchFM achieves state-of-the art performance compared to existing neural models. Third, we use these finetuned models to search for tables that are unionable, joinable, or can be subsets of each other. Our results demonstrate improvements in F1 scores for search compared to state-of-the-art techniques (even up to 70% improvement in a joinable search benchmark). Finally, we show significant transfer across datasets and tasks establishing that our model can generalize across different tasks over different data lakes △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2307.04217

arXiv:2407.01437 [pdf, other]

Needle in the Haystack for Memory Based Large Language Models

Authors: Elliot Nelson, Georgios Kollias, Payel Das, Subhajit Chaudhury, Soham Dan

Abstract: Current large language models (LLMs) often perform poorly on simple fact retrieval tasks. Here we investigate if coupling a dynamically adaptable external memory to a LLM can alleviate this problem. For this purpose, we test Larimar, a recently proposed language model architecture which uses an external associative memory, on long-context recall tasks including passkey and needle-in-the-haystack t… ▽ More Current large language models (LLMs) often perform poorly on simple fact retrieval tasks. Here we investigate if coupling a dynamically adaptable external memory to a LLM can alleviate this problem. For this purpose, we test Larimar, a recently proposed language model architecture which uses an external associative memory, on long-context recall tasks including passkey and needle-in-the-haystack tests. We demonstrate that the external memory of Larimar, which allows fast write and read of an episode of text samples, can be used at test time to handle contexts much longer than those seen during training. We further show that the latent readouts from the memory (to which long contexts are written) control the decoder towards generating correct outputs, with the memory stored off of the GPU. Compared to existing transformer-based LLM architectures for long-context recall tasks that use larger parameter counts or modified attention mechanisms, a relatively smaller size Larimar is able to maintain strong performance without any task-specific training or training on longer contexts. △ Less

Submitted 12 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

Comments: 5 pages; slightly revised abstract

arXiv:2404.10174 [pdf, other]

On the Effects of Fine-tuning Language Models for Text-Based Reinforcement Learning

Authors: Mauricio Gruppi, Soham Dan, Keerthiram Murugesan, Subhajit Chaudhury

Abstract: Text-based reinforcement learning involves an agent interacting with a fictional environment using observed text and admissible actions in natural language to complete a task. Previous works have shown that agents can succeed in text-based interactive environments even in the complete absence of semantic understanding or other linguistic capabilities. The success of these agents in playing such ga… ▽ More Text-based reinforcement learning involves an agent interacting with a fictional environment using observed text and admissible actions in natural language to complete a task. Previous works have shown that agents can succeed in text-based interactive environments even in the complete absence of semantic understanding or other linguistic capabilities. The success of these agents in playing such games suggests that semantic understanding may not be important for the task. This raises an important question about the benefits of LMs in guiding the agents through the game states. In this work, we show that rich semantic understanding leads to efficient training of text-based RL agents. Moreover, we describe the occurrence of semantic degeneration as a consequence of inappropriate fine-tuning of language models in text-based reinforcement learning (TBRL). Specifically, we describe the shift in the semantic representation of words in the LM, as well as how it affects the performance of the agent in tasks that are semantically similar to the training games. We believe these results may help develop better strategies to fine-tune agents in text-based RL scenarios. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2403.11901 [pdf, other]

Larimar: Large Language Models with Episodic Memory Control

Authors: Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jiří, Navrátil, Soham Dan, Pin-Yu Chen

Abstract: Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tunin… ▽ More Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed - yielding speed-ups of 8-10x depending on the base LLM - as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting, information leakage prevention, and input context length generalization with Larimar and show their effectiveness. Our code is available at https://github.com/IBM/larimar △ Less

Submitted 6 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: ICML 2024

arXiv:2403.10692 [pdf, other]

EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning

Authors: Kinjal Basu, Keerthiram Murugesan, Subhajit Chaudhury, Murray Campbell, Kartik Talamadupula, Tim Klinger

Abstract: Text-based games (TBGs) have emerged as an important collection of NLP tasks, requiring reinforcement learning (RL) agents to combine natural language understanding with reasoning. A key challenge for agents attempting to solve such tasks is to generalize across multiple games and demonstrate good performance on both seen and unseen objects. Purely deep-RL-based approaches may perform well on seen… ▽ More Text-based games (TBGs) have emerged as an important collection of NLP tasks, requiring reinforcement learning (RL) agents to combine natural language understanding with reasoning. A key challenge for agents attempting to solve such tasks is to generalize across multiple games and demonstrate good performance on both seen and unseen objects. Purely deep-RL-based approaches may perform well on seen objects; however, they fail to showcase the same performance on unseen objects. Commonsense-infused deep-RL agents may work better on unseen data; unfortunately, their policies are often not interpretable or easily transferable. To tackle these issues, in this paper, we present EXPLORER which is an exploration-guided reasoning agent for textual reinforcement learning. EXPLORER is neurosymbolic in nature, as it relies on a neural module for exploration and a symbolic module for exploitation. It can also learn generalized symbolic policies and perform well over unseen data. Our experiments show that EXPLORER outperforms the baseline agents on Text-World cooking (TW-Cooking) and Text-World Commonsense (TWC) games. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.06009 [pdf, other]

Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

Authors: Swapnaja Achintalwar, Adriana Alvarado Garcia, Ateret Anaby-Tavor, Ioana Baldini, Sara E. Berger, Bishwaranjan Bhattacharjee, Djallel Bouneffouf, Subhajit Chaudhury, Pin-Yu Chen, Lamogha Chiazor, Elizabeth M. Daly, Kirushikesh DB, Rogério Abreu de Paula, Pierre Dognin, Eitan Farchi, Soumya Ghosh, Michael Hind, Raya Horesh, George Kour, Ja Young Lee, Nishtha Madaan, Sameep Mehta, Erik Miehling, Keerthiram Murugesan, Manish Nagireddy , et al. (13 additional authors not shown)

Abstract: Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we presen… ▽ More Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we present our ongoing efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms. In addition to the detectors themselves, we discuss a wide range of uses for these detector models - from acting as guardrails to enabling effective AI governance. We also deep dive into inherent challenges in their development and discuss future work aimed at making the detectors more reliable and broadening their scope. △ Less

Submitted 13 June, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

arXiv:2402.15491 [pdf, other]

API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs

Authors: Kinjal Basu, Ibrahim Abdelaziz, Subhajit Chaudhury, Soham Dan, Maxwell Crouse, Asim Munawar, Sadhana Kumaravel, Vinod Muthusamy, Pavan Kapanipathi, Luis A. Lastras

Abstract: There is a growing need for Large Language Models (LLMs) to effectively use tools and external Application Programming Interfaces (APIs) to plan and complete tasks. As such, there is tremendous interest in methods that can acquire sufficient quantities of train and test data that involve calls to tools / APIs. Two lines of research have emerged as the predominant strategies for addressing this cha… ▽ More There is a growing need for Large Language Models (LLMs) to effectively use tools and external Application Programming Interfaces (APIs) to plan and complete tasks. As such, there is tremendous interest in methods that can acquire sufficient quantities of train and test data that involve calls to tools / APIs. Two lines of research have emerged as the predominant strategies for addressing this challenge. The first has focused on synthetic data generation techniques, while the second has involved curating task-adjacent datasets which can be transformed into API / Tool-based tasks. In this paper, we focus on the task of identifying, curating, and transforming existing datasets and, in turn, introduce API-BLEND, a large corpora for training and systematic testing of tool-augmented LLMs. The datasets mimic real-world scenarios involving API-tasks such as API / tool detection, slot filling, and sequencing of the detected APIs. We demonstrate the utility of the API-BLEND dataset for both training and benchmarking purposes. △ Less

Submitted 20 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: Accepted at ACL'24-main conference

arXiv:2402.07301 [pdf, other]

LISR: Learning Linear 3D Implicit Surface Representation Using Compactly Supported Radial Basis Functions

Authors: Atharva Pandey, Vishal Yadav, Rajendra Nagar, Santanu Chaudhury

Abstract: Implicit 3D surface reconstruction of an object from its partial and noisy 3D point cloud scan is the classical geometry processing and 3D computer vision problem. In the literature, various 3D shape representations have been developed, differing in memory efficiency and shape retrieval effectiveness, such as volumetric, parametric, and implicit surfaces. Radial basis functions provide memory-effi… ▽ More Implicit 3D surface reconstruction of an object from its partial and noisy 3D point cloud scan is the classical geometry processing and 3D computer vision problem. In the literature, various 3D shape representations have been developed, differing in memory efficiency and shape retrieval effectiveness, such as volumetric, parametric, and implicit surfaces. Radial basis functions provide memory-efficient parameterization of the implicit surface. However, we show that training a neural network using the mean squared error between the ground-truth implicit surface and the linear basis-based implicit surfaces does not converge to the global solution. In this work, we propose locally supported compact radial basis functions for a linear representation of the implicit surface. This representation enables us to generate 3D shapes with arbitrary topologies at any resolution due to their continuous nature. We then propose a neural network architecture for learning the linear implicit shape representation of the 3D surface of an object. We learn linear implicit shapes within a supervised learning framework using ground truth Signed-Distance Field (SDF) data for guidance. The classical strategies face difficulties in finding linear implicit shapes from a given 3D point cloud due to numerical issues (requires solving inverse of a large matrix) in basis and query point selection. The proposed approach achieves better Chamfer distance and comparable F-score than the state-of-the-art approach on the benchmark dataset. We also show the effectiveness of the proposed approach by using it for the 3D shape completion task. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Journal ref: AAAI 2024

arXiv:2310.16173 [pdf, other]

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $ε$-Greedy Exploration

Authors: Shuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Keerthiram Murugesan, Subhajit Chaudhury

Abstract: This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $\varepsilon$-greedy exploration in deep reinforcement learning. Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored. First, the exploration strategy is either impractical or ignored in the existing analysis. Second, in contrast to conventional Q-learning alg… ▽ More This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $\varepsilon$-greedy exploration in deep reinforcement learning. Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored. First, the exploration strategy is either impractical or ignored in the existing analysis. Second, in contrast to conventional Q-learning algorithms, the DQN employs the target network and experience replay to acquire an unbiased estimation of the mean-square Bellman error (MSBE) utilized in training the Q-network. However, the existing theoretical analysis of DQNs lacks convergence analysis or bypasses the technical challenges by deploying a significantly overparameterized neural network, which is not computationally efficient. This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with $ε$-greedy policy. We prove an iterative procedure with decaying $ε$ converges to the optimal Q-value function geometrically. Moreover, a higher level of $ε$ values enlarges the region of convergence but slows down the convergence, while the opposite holds for a lower level of $ε$ values. Experiments justify our established theoretical insights on DQNs. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Journal ref: Neurips 2023

arXiv:2307.04217 [pdf, other]

LakeBench: Benchmarks for Data Discovery over Data Lakes

Authors: Kavitha Srinivas, Julian Dolby, Ibrahim Abdelaziz, Oktie Hassanzadeh, Harsha Kokel, Aamod Khatiwada, Tejaswini Pedapati, Subhajit Chaudhury, Horst Samulowitz

Abstract: Within enterprises, there is a growing need to intelligently navigate data lakes, specifically focusing on data discovery. Of particular importance to enterprises is the ability to find related tables in data repositories. These tables can be unionable, joinable, or subsets of each other. There is a dearth of benchmarks for these tasks in the public domain, with related work targeting private data… ▽ More Within enterprises, there is a growing need to intelligently navigate data lakes, specifically focusing on data discovery. Of particular importance to enterprises is the ability to find related tables in data repositories. These tables can be unionable, joinable, or subsets of each other. There is a dearth of benchmarks for these tasks in the public domain, with related work targeting private datasets. In LakeBench, we develop multiple benchmarks for these tasks by using the tables that are drawn from a diverse set of data sources such as government data from CKAN, Socrata, and the European Central Bank. We compare the performance of 4 publicly available tabular foundational models on these tasks. None of the existing models had been trained on the data discovery tasks that we developed for this benchmark; not surprisingly, their performance shows significant room for improvement. The results suggest that the establishment of such benchmarks may be useful to the community to build tabular models usable for data discovery in data lakes. △ Less

Submitted 9 July, 2023; originally announced July 2023.

arXiv:2307.02689 [pdf, other]

Learning Symbolic Rules over Abstract Meaning Representations for Textual Reinforcement Learning

Authors: Subhajit Chaudhury, Sarathkrishna Swaminathan, Daiki Kimura, Prithviraj Sen, Keerthiram Murugesan, Rosario Uceda-Sosa, Michiaki Tatsubori, Achille Fokoue, Pavan Kapanipathi, Asim Munawar, Alexander Gray

Abstract: Text-based reinforcement learning agents have predominantly been neural network-based models with embeddings-based representation, learning uninterpretable policies that often do not generalize well to unseen games. On the other hand, neuro-symbolic methods, specifically those that leverage an intermediate formal representation, are gaining significant attention in language understanding tasks. Th… ▽ More Text-based reinforcement learning agents have predominantly been neural network-based models with embeddings-based representation, learning uninterpretable policies that often do not generalize well to unseen games. On the other hand, neuro-symbolic methods, specifically those that leverage an intermediate formal representation, are gaining significant attention in language understanding tasks. This is because of their advantages ranging from inherent interpretability, the lesser requirement of training data, and being generalizable in scenarios with unseen data. Therefore, in this paper, we propose a modular, NEuro-Symbolic Textual Agent (NESTA) that combines a generic semantic parser with a rule induction system to learn abstract interpretable rules as policies. Our experiments on established text-based game benchmarks show that the proposed NESTA method outperforms deep reinforcement learning-based techniques by achieving better generalization to unseen test games and learning from fewer training interactions. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: ACL 2023

arXiv:2306.10452 [pdf, other]

MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types

Authors: Keerthiram Murugesan, Sarathkrishna Swaminathan, Soham Dan, Subhajit Chaudhury, Chulaka Gunasekara, Maxwell Crouse, Diwakar Mahajan, Ibrahim Abdelaziz, Achille Fokoue, Pavan Kapanipathi, Salim Roukos, Alexander Gray

Abstract: With the growing interest in large language models, the need for evaluating the quality of machine text compared to reference (typically human-generated) text has become focal attention. Most recent works focus either on task-specific evaluation metrics or study the properties of machine-generated text captured by the existing metrics. In this work, we propose a new evaluation scheme to model huma… ▽ More With the growing interest in large language models, the need for evaluating the quality of machine text compared to reference (typically human-generated) text has become focal attention. Most recent works focus either on task-specific evaluation metrics or study the properties of machine-generated text captured by the existing metrics. In this work, we propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts. Inspired by the recent efforts in several NLP tasks for fine-grained evaluation, we introduce a set of 13 mismatch error types such as spatial/geographic errors, entity errors, etc, to guide the model for better prediction of human judgments. We propose a neural framework for evaluating machine texts that uses these mismatch error types as auxiliary tasks and re-purposes the existing single-number evaluation metrics as additional scalar features, in addition to textual features extracted from the machine and reference texts. Our experiments reveal key insights about the existing metrics via the mismatch errors. We show that the mismatch errors between the sentence pairs on the held-out datasets from 7 NLP tasks align well with the human evaluation. △ Less

Submitted 17 June, 2023; originally announced June 2023.

Comments: Accepted at ACL 2023 (ACL Findings Long)

arXiv:2305.20018 [pdf, other]

Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency

Authors: Maxwell Crouse, Ramon Astudillo, Tahira Naseem, Subhajit Chaudhury, Pavan Kapanipathi, Salim Roukos, Alexander Gray

Abstract: We introduce Logical Offline Cycle Consistency Optimization (LOCCO), a scalable, semi-supervised method for training a neural semantic parser. Conceptually, LOCCO can be viewed as a form of self-learning where the semantic parser being trained is used to generate annotations for unlabeled text that are then used as new supervision. To increase the quality of annotations, our method utilizes a coun… ▽ More We introduce Logical Offline Cycle Consistency Optimization (LOCCO), a scalable, semi-supervised method for training a neural semantic parser. Conceptually, LOCCO can be viewed as a form of self-learning where the semantic parser being trained is used to generate annotations for unlabeled text that are then used as new supervision. To increase the quality of annotations, our method utilizes a count-based prior over valid formal meaning representations and a cycle-consistency score produced by a neural text generation model as additional signals. Both the prior and semantic parser are updated in an alternate fashion from full passes over the training data, which can be seen as approximating the marginalization of latent structures through stochastic variational inference. The use of a count-based prior, frozen text generation model, and offline annotation process yields an approach with negligible complexity and latency increases as compared to conventional self-learning. As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model. We demonstrate the utility of LOCCO on the well-known WebNLG benchmark where we obtain an improvement of 2 points against a self-learning parser under equivalent conditions, an improvement of 1.3 points against the previous state-of-the-art parser, and competitive text generation performance in terms of BLEU score. △ Less

Submitted 31 May, 2023; originally announced May 2023.

arXiv:2305.04346 [pdf, other]

Laziness Is a Virtue When It Comes to Compositionality in Neural Semantic Parsing

Authors: Maxwell Crouse, Pavan Kapanipathi, Subhajit Chaudhury, Tahira Naseem, Ramon Astudillo, Achille Fokoue, Tim Klinger

Abstract: Nearly all general-purpose neural semantic parsers generate logical forms in a strictly top-down autoregressive fashion. Though such systems have achieved impressive results across a variety of datasets and domains, recent works have called into question whether they are ultimately limited in their ability to compositionally generalize. In this work, we approach semantic parsing from, quite litera… ▽ More Nearly all general-purpose neural semantic parsers generate logical forms in a strictly top-down autoregressive fashion. Though such systems have achieved impressive results across a variety of datasets and domains, recent works have called into question whether they are ultimately limited in their ability to compositionally generalize. In this work, we approach semantic parsing from, quite literally, the opposite direction; that is, we introduce a neural semantic parsing generation method that constructs logical forms from the bottom up, beginning from the logical form's leaves. The system we introduce is lazy in that it incrementally builds up a set of potential semantic parses, but only expands and processes the most promising candidate parses at each generation step. Such a parsimonious expansion scheme allows the system to maintain an arbitrarily large set of parse hypotheses that are never realized and thus incur minimal computational overhead. We evaluate our approach on compositional generalization; specifically, on the challenging CFQ dataset and three Text-to-SQL datasets where we show that our novel, bottom-up semantic parsing technique outperforms general-purpose semantic parsers while also being competitive with comparable neural parsers that have been designed for each task. △ Less

Submitted 7 May, 2023; originally announced May 2023.

Comments: Accepted to ACL main conference

arXiv:2303.05556 [pdf, other]

An Evaluation of Non-Contrastive Self-Supervised Learning for Federated Medical Image Analysis

Authors: Soumitri Chattopadhyay, Soham Ganguly, Sreejit Chaudhury, Sayan Nag, Samiran Chattopadhyay

Abstract: Privacy and annotation bottlenecks are two major issues that profoundly affect the practicality of machine learning-based medical image analysis. Although significant progress has been made in these areas, these issues are not yet fully resolved. In this paper, we seek to tackle these concerns head-on and systematically explore the applicability of non-contrastive self-supervised learning (SSL) al… ▽ More Privacy and annotation bottlenecks are two major issues that profoundly affect the practicality of machine learning-based medical image analysis. Although significant progress has been made in these areas, these issues are not yet fully resolved. In this paper, we seek to tackle these concerns head-on and systematically explore the applicability of non-contrastive self-supervised learning (SSL) algorithms under federated learning (FL) simulations for medical image analysis. We conduct thorough experimentation of recently proposed state-of-the-art non-contrastive frameworks under standard FL setups. With the SoTA Contrastive Learning algorithm, SimCLR as our comparative baseline, we benchmark the performances of our 4 chosen non-contrastive algorithms under non-i.i.d. data conditions and with a varying number of clients. We present a holistic evaluation of these techniques on 6 standardized medical imaging datasets. We further analyse different trends inferred from the findings of our research, with the aim to find directions for further research based on ours. To the best of our knowledge, ours is the first to perform such a thorough analysis of federated self-supervised learning for medical imaging. All of our source code will be made public upon acceptance of the paper. △ Less

Submitted 9 March, 2023; originally announced March 2023.

arXiv:2303.02245 [pdf, other]

Exploring Self-Supervised Representation Learning For Low-Resource Medical Image Analysis

Authors: Soumitri Chattopadhyay, Soham Ganguly, Sreejit Chaudhury, Sayan Nag, Samiran Chattopadhyay

Abstract: The success of self-supervised learning (SSL) has mostly been attributed to the availability of unlabeled yet large-scale datasets. However, in a specialized domain such as medical imaging which is a lot different from natural images, the assumption of data availability is unrealistic and impractical, as the data itself is scanty and found in small databases, collected for specific prognosis tasks… ▽ More The success of self-supervised learning (SSL) has mostly been attributed to the availability of unlabeled yet large-scale datasets. However, in a specialized domain such as medical imaging which is a lot different from natural images, the assumption of data availability is unrealistic and impractical, as the data itself is scanty and found in small databases, collected for specific prognosis tasks. To this end, we seek to investigate the applicability of self-supervised learning algorithms on small-scale medical imaging datasets. In particular, we evaluate $4$ state-of-the-art SSL methods on three publicly accessible \emph{small} medical imaging datasets. Our investigation reveals that in-domain low-resource SSL pre-training can yield competitive performance to transfer learning from large-scale datasets (such as ImageNet). Furthermore, we extensively analyse our empirical findings to provide valuable insights that can motivate for further research towards circumventing the need for pre-training on a large image corpus. To the best of our knowledge, this is the first attempt to holistically explore self-supervision on low-resource medical datasets. △ Less

Submitted 28 June, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: Accepted at IEEE ICIP 2023

arXiv:2210.12624 [pdf, other]

Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Stochastic Approach

Authors: Heshan Fernando, Han Shen, Miao Liu, Subhajit Chaudhury, Keerthiram Murugesan, Tianyi Chen

Abstract: Machine learning problems with multiple objective functions appear either in learning with multiple criteria where learning has to make a trade-off between multiple performance metrics such as fairness, safety and accuracy; or, in multi-task learning where multiple tasks are optimized jointly, sharing inductive bias between them. This problems are often tackled by the multi-objective optimization… ▽ More Machine learning problems with multiple objective functions appear either in learning with multiple criteria where learning has to make a trade-off between multiple performance metrics such as fairness, safety and accuracy; or, in multi-task learning where multiple tasks are optimized jointly, sharing inductive bias between them. This problems are often tackled by the multi-objective optimization framework. However, existing stochastic multi-objective gradient methods and its variants (e.g., MGDA, PCGrad, CAGrad, etc.) all adopt a biased noisy gradient direction, which leads to degraded empirical performance. To this end, we develop a stochastic Multi-objective gradient Correction (MoCo) method for multi-objective optimization. The unique feature of our method is that it can guarantee convergence without increasing the batch size even in the non-convex setting. Simulations on multi-task supervised and reinforcement learning demonstrate the effectiveness of our method relative to state-of-the-art methods. △ Less

Submitted 19 March, 2024; v1 submitted 23 October, 2022; originally announced October 2022.

Comments: Changed hyper-parameter choice which affects some of the convergence rate results in the paper

arXiv:2206.12871 [pdf, ps, other]

A Matrix Analogue of Schur-Siegel-Smyth Trace Problem

Authors: Srijonee Shabnam Chaudhury

Abstract: Let $\mathcal{S}$ be the set of all positive-definite, symmetrizable integer matrices with non-zero upper and lower diagonal and $\mathcal{T}$ to be the set of all positive-definite real symmetric matrices with nonzero upper diagonal such that all non-zero entries are square-roots of some positive integers and the matrices satisfy a certain cycle condition. In this paper, for any $n \times n$ ma… ▽ More Let $\mathcal{S}$ be the set of all positive-definite, symmetrizable integer matrices with non-zero upper and lower diagonal and $\mathcal{T}$ to be the set of all positive-definite real symmetric matrices with nonzero upper diagonal such that all non-zero entries are square-roots of some positive integers and the matrices satisfy a certain cycle condition. In this paper, for any $n \times n$ matrix $A \in \mathcal{S} \cup \mathcal{T}$ and any $k \in \mathbb{N}$ we find a general lower bound for $Tr_{2^k}(A)$, i.e, the sum of $2^k$-th power of eigenvalues of $A$, which depends on $n$ as well as some other variables. In particular, we obtain the best possible lower bound for $Tr_2(A) $ that is $6n - 5$. As a strong outcome of this result we show that the smallest limit point of $\overline{Tr_2(A)} = \frac{Tr_2(A)}{n}$ is $6$. This is a solution of an analogue of ``Schur - Siegel - Smyth trace problem" for characteristic polynomials of matrices in $\mathcal{S} \cup \mathcal{T}$. We also obtain a lower bound of smallest limit point of $\overline{Tr_{2^k}(A)}$ for any positive integer $k > 1$ and for the same set of matrices. Furthermore, we exhibit that the famous results of Smyth on density of absolute trace measure and absolute trace-2 measure of totally positive integers are also true for the set of symmetric integer connected positive definite matrices. △ Less

Submitted 23 January, 2024; v1 submitted 26 June, 2022; originally announced June 2022.

Comments: 24 pages

MSC Class: 15A18 15B36 15B57 11C08

arXiv:2203.10888 [pdf, ps, other]

Proposal for Quantum Ciphertext-Policy Attribute-Based Encryption

Authors: Asmita Samanta, Arpita Maitra, Shion Samadder Chaudhury

Abstract: A Quantum Ciphertext-Policy Attribute-Based Encryption scheme (QCP-ABE) has been presented. In classical domain, most of the popular ABE schemes are based on the hardness of the Bilinear Diffie-Hellman Exponent problem, which has been proven to be vulnerable against Shor's algorithm. Recently, some quantum safe ABE schemes have been proposed exploiting the Lattice problem. However, no efficient Qu… ▽ More A Quantum Ciphertext-Policy Attribute-Based Encryption scheme (QCP-ABE) has been presented. In classical domain, most of the popular ABE schemes are based on the hardness of the Bilinear Diffie-Hellman Exponent problem, which has been proven to be vulnerable against Shor's algorithm. Recently, some quantum safe ABE schemes have been proposed exploiting the Lattice problem. However, no efficient Quantum Attribute-Based Encryption scheme has been reported till date. In this backdrop, in the present initiative, we propose a quantum CP-ABE scheme exploiting Quantum Key Distribution (QKD) and Quantum Error Correcting code. A Semi Quantum version of the scheme has also been considered. Finally, we introduced dynamic access structure in our proposed protocols. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: 12 pages

arXiv:2112.14489 [pdf, ps, other]

Sums of squares of integer-multiple of an integral element on real bi-quadratic fields

Authors: Srijonee Shabnam Chaudhury

Abstract: For any given positive integer $m$ we construct certain totally positive algebraic integers $α$ of a real bi-quadratic field $K$ and obtain some necessary conditions for which $mα$ can not be represented as sum of integral squares. We show this for integers lie in quadratic subfields of $K$ and for integers which are in $K$ but not in any quadratic subfield of $K$. We provide examples in tabular f… ▽ More For any given positive integer $m$ we construct certain totally positive algebraic integers $α$ of a real bi-quadratic field $K$ and obtain some necessary conditions for which $mα$ can not be represented as sum of integral squares. We show this for integers lie in quadratic subfields of $K$ and for integers which are in $K$ but not in any quadratic subfield of $K$. We provide examples in tabular form for each cases to corroborate the results. △ Less

Submitted 9 February, 2024; v1 submitted 29 December, 2021; originally announced December 2021.

Comments: 33 pages

MSC Class: 11E25; 11R16; 11R33

arXiv:2110.10973 [pdf, other]

LOA: Logical Optimal Actions for Text-based Interaction Games

Authors: Daiki Kimura, Subhajit Chaudhury, Masaki Ono, Michiaki Tatsubori, Don Joven Agravante, Asim Munawar, Akifumi Wachi, Ryosuke Kohita, Alexander Gray

Abstract: We present Logical Optimal Actions (LOA), an action decision architecture of reinforcement learning applications with a neuro-symbolic framework which is a combination of neural network and symbolic knowledge acquisition approach for natural language interaction games. The demonstration for LOA experiments consists of a web-based interactive platform for text-based games and visualization for acqu… ▽ More We present Logical Optimal Actions (LOA), an action decision architecture of reinforcement learning applications with a neuro-symbolic framework which is a combination of neural network and symbolic knowledge acquisition approach for natural language interaction games. The demonstration for LOA experiments consists of a web-based interactive platform for text-based games and visualization for acquired knowledge for improving interpretability for trained rules. This demonstration also provides a comparison module with other neuro-symbolic approaches as well as non-symbolic state-of-the-art agent models on the same text-based games. Our LOA also provides open-sourced implementation in Python for the reinforcement learning environment to facilitate an experiment for studying neuro-symbolic agents. Code: https://github.com/ibm/loa △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: ACL-IJCNLP 2021 (demo paper)

arXiv:2110.10963 [pdf, other]

Neuro-Symbolic Reinforcement Learning with First-Order Logic

Authors: Daiki Kimura, Masaki Ono, Subhajit Chaudhury, Ryosuke Kohita, Akifumi Wachi, Don Joven Agravante, Michiaki Tatsubori, Asim Munawar, Alexander Gray

Abstract: Deep reinforcement learning (RL) methods often require many trials before convergence, and no direct interpretability of trained policies is provided. In order to achieve fast convergence and interpretability for the policy in RL, we propose a novel RL method for text-based games with a recent neuro-symbolic framework called Logical Neural Network, which can learn symbolic and interpretable rules… ▽ More Deep reinforcement learning (RL) methods often require many trials before convergence, and no direct interpretability of trained policies is provided. In order to achieve fast convergence and interpretability for the policy in RL, we propose a novel RL method for text-based games with a recent neuro-symbolic framework called Logical Neural Network, which can learn symbolic and interpretable rules in their differentiable network. The method is first to extract first-order logical facts from text observation and external word meaning network (ConceptNet), then train a policy in the network with directly interpretable logical operators. Our experimental results show RL training with the proposed method converges significantly faster than other state-of-the-art neuro-symbolic methods in a TextWorld benchmark. △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: EMNLP 2021 (main conference)

arXiv:2109.03575 [pdf, other]

Deriving Explanation of Deep Visual Saliency Models

Authors: Sai Phani Kumar Malladi, Jayanta Mukhopadhyay, Chaker Larabi, Santanu Chaudhury

Abstract: Deep neural networks have shown their profound impact on achieving human level performance in visual saliency prediction. However, it is still unclear how they learn the task and what it means in terms of understanding human visual system. In this work, we develop a technique to derive explainable saliency models from their corresponding deep neural architecture based saliency models by applying h… ▽ More Deep neural networks have shown their profound impact on achieving human level performance in visual saliency prediction. However, it is still unclear how they learn the task and what it means in terms of understanding human visual system. In this work, we develop a technique to derive explainable saliency models from their corresponding deep neural architecture based saliency models by applying human perception theories and the conventional concepts of saliency. This technique helps us understand the learning pattern of the deep network at its intermediate layers through their activation maps. Initially, we consider two state-of-the-art deep saliency models, namely UNISAL and MSI-Net for our interpretation. We use a set of biologically plausible log-gabor filters for identifying and reconstructing the activation maps of them using our explainable saliency model. The final saliency map is generated using these reconstructed activation maps. We also build our own deep saliency model named cross-concatenated multi-scale residual block based network (CMRNet) for saliency prediction. Then, we evaluate and compare the performance of the explainable models derived from UNISAL, MSI-Net and CMRNet on three benchmark datasets with other state-of-the-art methods. Hence, we propose that this approach of explainability can be applied to any deep visual saliency model for interpretation which makes it a generic one. △ Less

Submitted 8 September, 2021; originally announced September 2021.

arXiv:2108.04558 [pdf, other]

Understanding Character Recognition using Visual Explanations Derived from the Human Visual System and Deep Networks

Authors: Chetan Ralekar, Shubham Choudhary, Tapan Kumar Gandhi, Santanu Chaudhury

Abstract: Human observers engage in selective information uptake when classifying visual patterns. The same is true of deep neural networks, which currently constitute the best performing artificial vision systems. Our goal is to examine the congruence, or lack thereof, in the information-gathering strategies of the two systems. We have operationalized our investigation as a character recognition task. We h… ▽ More Human observers engage in selective information uptake when classifying visual patterns. The same is true of deep neural networks, which currently constitute the best performing artificial vision systems. Our goal is to examine the congruence, or lack thereof, in the information-gathering strategies of the two systems. We have operationalized our investigation as a character recognition task. We have used eye-tracking to assay the spatial distribution of information hotspots for humans via fixation maps and an activation mapping technique for obtaining analogous distributions for deep networks through visualization maps. Qualitative comparison between visualization maps and fixation maps reveals an interesting correlate of congruence. The deep learning model considered similar regions in character, which humans have fixated in the case of correctly classified characters. On the other hand, when the focused regions are different for humans and deep nets, the characters are typically misclassified by the latter. Hence, we propose to use the visual fixation maps obtained from the eye-tracking experiment as a supervisory input to align the model's focus on relevant character regions. We find that such supervision improves the model's performance significantly and does not require any additional parameters. This approach has the potential to find applications in diverse domains such as medical analysis and surveillance in which explainability helps to determine system fidelity. △ Less

Submitted 29 August, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

arXiv:2106.05387 [pdf, other]

Eye of the Beholder: Improved Relation Generalization for Text-based Reinforcement Learning Agents

Authors: Keerthiram Murugesan, Subhajit Chaudhury, Kartik Talamadupula

Abstract: Text-based games (TBGs) have become a popular proving ground for the demonstration of learning-based agents that make decisions in quasi real-world settings. The crux of the problem for a reinforcement learning agent in such TBGs is identifying the objects in the world, and those objects' relations with that world. While the recent use of text-based resources for increasing an agent's knowledge an… ▽ More Text-based games (TBGs) have become a popular proving ground for the demonstration of learning-based agents that make decisions in quasi real-world settings. The crux of the problem for a reinforcement learning agent in such TBGs is identifying the objects in the world, and those objects' relations with that world. While the recent use of text-based resources for increasing an agent's knowledge and improving its generalization have shown promise, we posit in this paper that there is much yet to be learned from visual representations of these same worlds. Specifically, we propose to retrieve images that represent specific instances of text observations from the world and train our agents on such images. This improves the agent's overall understanding of the game 'scene' and objects' relationships to the world around them, and the variety of visual representations on offer allow the agent to generate a better generalization of a relationship. We show that incorporating such images improves the performance of agents in various TBG settings. △ Less

Submitted 15 June, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

arXiv:2103.05322 [pdf, ps, other]

Sums of integral squares in complex bi-quadratic fields and in CM fields

Authors: Srijonee Shabnam Chaudhury

Abstract: Let $K$ be a complex bi-quadratic field with ring of integers $\mathcal{O}_{K}$. For $K = \mathbb{Q}(\sqrt{-m}$, $\sqrt{n}$), where $ m \equiv 3 \pmod 4 $ and $ n \equiv 1 \pmod 4$, we prove that every algebraic integer can be written as sum of integral squares. Using this, we prove that for any complex bi-quadratic field $K$, every element of $4\mathcal{O}_K$ can be written as sum of five integra… ▽ More Let $K$ be a complex bi-quadratic field with ring of integers $\mathcal{O}_{K}$. For $K = \mathbb{Q}(\sqrt{-m}$, $\sqrt{n}$), where $ m \equiv 3 \pmod 4 $ and $ n \equiv 1 \pmod 4$, we prove that every algebraic integer can be written as sum of integral squares. Using this, we prove that for any complex bi-quadratic field $K$, every element of $4\mathcal{O}_K$ can be written as sum of five integral squares. In addition, we show that the Pythagoras number of ring of integers of any CM field is at most five. Moreover, we give two classes of complex bi-quadratic fields for which $p(\mathcal{O}_{K})= 3$ and $p(4\mathcal{O}_{K})=3 $ respectively. Here, $p(\mathcal{O}_{K})$ is the Pythagoras number of ring of integers of $K$ and $p(4\mathcal{O}_{K})$ is the smallest positive integer $t$ such that every element of $4\mathcal{O}_{K}$ can be written as sum of $t$ integral squares. △ Less

Submitted 9 March, 2021; originally announced March 2021.

Comments: 11pages. arXiv admin note: text overlap with arXiv:2005.13870

arXiv:2103.02363 [pdf, other]

Reinforcement Learning with External Knowledge by using Logical Neural Networks

Authors: Daiki Kimura, Subhajit Chaudhury, Akifumi Wachi, Ryosuke Kohita, Asim Munawar, Michiaki Tatsubori, Alexander Gray

Abstract: Conventional deep reinforcement learning methods are sample-inefficient and usually require a large number of training trials before convergence. Since such methods operate on an unconstrained action set, they can lead to useless actions. A recent neuro-symbolic framework called the Logical Neural Networks (LNNs) can simultaneously provide key-properties of both neural networks and symbolic logic.… ▽ More Conventional deep reinforcement learning methods are sample-inefficient and usually require a large number of training trials before convergence. Since such methods operate on an unconstrained action set, they can lead to useless actions. A recent neuro-symbolic framework called the Logical Neural Networks (LNNs) can simultaneously provide key-properties of both neural networks and symbolic logic. The LNNs functions as an end-to-end differentiable network that minimizes a novel contradiction loss to learn interpretable rules. In this paper, we utilize LNNs to define an inference graph using basic logical operations, such as AND and NOT, for faster convergence in reinforcement learning. Specifically, we propose an integrated method that enables model-free reinforcement learning from external knowledge sources in an LNNs-based logical constrained framework such as action shielding and guide. Our results empirically demonstrate that our method converges faster compared to a model-free reinforcement learning method that doesn't have such logical constraints. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: KBRL Workshop at IJCAI-PRICAI 2020

arXiv:2012.01832 [pdf, other]

doi 10.1117/1.JEI.30.2.023016

Image inpainting using frequency domain priors

Authors: Hiya Roy, Subhajit Chaudhury, Toshihiko Yamasaki, Tatsuaki Hashimoto

Abstract: In this paper, we present a novel image inpainting technique using frequency domain information. Prior works on image inpainting predict the missing pixels by training neural networks using only the spatial domain information. However, these methods still struggle to reconstruct high-frequency details for real complex scenes, leading to a discrepancy in color, boundary artifacts, distorted pattern… ▽ More In this paper, we present a novel image inpainting technique using frequency domain information. Prior works on image inpainting predict the missing pixels by training neural networks using only the spatial domain information. However, these methods still struggle to reconstruct high-frequency details for real complex scenes, leading to a discrepancy in color, boundary artifacts, distorted patterns, and blurry textures. To alleviate these problems, we investigate if it is possible to obtain better performance by training the networks using frequency domain information (Discrete Fourier Transform) along with the spatial domain information. To this end, we propose a frequency-based deconvolution module that enables the network to learn the global context while selectively reconstructing the high-frequency components. We evaluate our proposed method on the publicly available datasets CelebA, Paris Streetview, and DTD texture dataset, and show that our method outperforms current state-of-the-art image inpainting techniques both qualitatively and quantitatively. △ Less

Submitted 3 December, 2020; originally announced December 2020.

arXiv:2010.13839 [pdf, other]

VisualHints: A Visual-Lingual Environment for Multimodal Reinforcement Learning

Authors: Thomas Carta, Subhajit Chaudhury, Kartik Talamadupula, Michiaki Tatsubori

Abstract: We present VisualHints, a novel environment for multimodal reinforcement learning (RL) involving text-based interactions along with visual hints (obtained from the environment). Real-life problems often demand that agents interact with the environment using both natural language information and visual perception towards solving a goal. However, most traditional RL environments either solve pure vi… ▽ More We present VisualHints, a novel environment for multimodal reinforcement learning (RL) involving text-based interactions along with visual hints (obtained from the environment). Real-life problems often demand that agents interact with the environment using both natural language information and visual perception towards solving a goal. However, most traditional RL environments either solve pure vision-based tasks like Atari games or video-based robotic manipulation; or entirely use natural language as a mode of interaction, like Text-based games and dialog systems. In this work, we aim to bridge this gap and unify these two approaches in a single environment for multimodal RL. We introduce an extension of the TextWorld cooking environment with the addition of visual clues interspersed throughout the environment. The goal is to force an RL agent to use both text and visual features to predict natural language action commands for solving the final task of cooking a meal. We enable variations and difficulties in our environment to emulate various interactive real-world scenarios. We present a baseline multimodal agent for solving such problems using CNN-based feature extraction from visual hints and LSTMs for textual feature extraction. We believe that our proposed visual-lingual environment will facilitate novel problem settings for the RL community. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: Code is available at http://ibm.biz/VisualHints

arXiv:2009.11896 [pdf, other]

Bootstrapped Q-learning with Context Relevant Observation Pruning to Generalize in Text-based Games

Authors: Subhajit Chaudhury, Daiki Kimura, Kartik Talamadupula, Michiaki Tatsubori, Asim Munawar, Ryuki Tachibana

Abstract: We show that Reinforcement Learning (RL) methods for solving Text-Based Games (TBGs) often fail to generalize on unseen games, especially in small data regimes. To address this issue, we propose Context Relevant Episodic State Truncation (CREST) for irrelevant token removal in observation text for improved generalization. Our method first trains a base model using Q-learning, which typically overf… ▽ More We show that Reinforcement Learning (RL) methods for solving Text-Based Games (TBGs) often fail to generalize on unseen games, especially in small data regimes. To address this issue, we propose Context Relevant Episodic State Truncation (CREST) for irrelevant token removal in observation text for improved generalization. Our method first trains a base model using Q-learning, which typically overfits the training games. The base model's action token distribution is used to perform observation pruning that removes irrelevant tokens. A second bootstrapped model is then retrained on the pruned observation text. Our bootstrapped agent shows improved generalization in solving unseen TextWorld games, using 10x-20x fewer training games compared to previous state-of-the-art methods despite requiring less number of training episodes. △ Less

Submitted 24 September, 2020; originally announced September 2020.

Comments: Accepted to EMNLP 2020

arXiv:2009.01478 [pdf, ps, other]

Pressure induced emergence of visible luminescence in $Cs_3Bi_2Br_9$: Effect of structural distortion in optical behaviour

Authors: Debabrata Samanta, Pinku Saha, Bishnupada Ghosh, Sonu Pratap Chaudhury, Sayan Bhattacharya, Swastika Chatterjee, Goutam Dev Mukherjee

Abstract: We report emergence of photoluminescence at room temperature in trigonal $Cs_3Bi_2Br_9$ at high pressures. Enhancement in intensity with pressure is found to be driven by increase in distortion of $BiBr_6$ octahedra and iso-structural transitions. Electronic band structure calculations show the sample in the high pressure phase to be an indirect band gap semiconductor. The luminescence peak profil… ▽ More We report emergence of photoluminescence at room temperature in trigonal $Cs_3Bi_2Br_9$ at high pressures. Enhancement in intensity with pressure is found to be driven by increase in distortion of $BiBr_6$ octahedra and iso-structural transitions. Electronic band structure calculations show the sample in the high pressure phase to be an indirect band gap semiconductor. The luminescence peak profile show signatures related to the recombination of free and self trapped excitons, respectively. Blue shift of the both peaks till about 4.4 GPa are due to the exciton recombination before relaxation due to the decrease in exciton lifetime with scattering from phonons △ Less

Submitted 3 September, 2020; originally announced September 2020.

arXiv:2008.03205 [pdf, other]

Multi-Task Driven Explainable Diagnosis of COVID-19 using Chest X-ray Images

Authors: Aakarsh Malhotra, Surbhi Mittal, Puspita Majumdar, Saheb Chhabra, Kartik Thakral, Mayank Vatsa, Richa Singh, Santanu Chaudhury, Ashwin Pudrod, Anjali Agrawal

Abstract: With increasing number of COVID-19 cases globally, all the countries are ramping up the testing numbers. While the RT-PCR kits are available in sufficient quantity in several countries, others are facing challenges with limited availability of testing kits and processing centers in remote areas. This has motivated researchers to find alternate methods of testing which are reliable, easily accessib… ▽ More With increasing number of COVID-19 cases globally, all the countries are ramping up the testing numbers. While the RT-PCR kits are available in sufficient quantity in several countries, others are facing challenges with limited availability of testing kits and processing centers in remote areas. This has motivated researchers to find alternate methods of testing which are reliable, easily accessible and faster. Chest X-Ray is one of the modalities that is gaining acceptance as a screening modality. Towards this direction, the paper has two primary contributions. Firstly, we present the COVID-19 Multi-Task Network which is an automated end-to-end network for COVID-19 screening. The proposed network not only predicts whether the CXR has COVID-19 features present or not, it also performs semantic segmentation of the regions of interest to make the model explainable. Secondly, with the help of medical professionals, we manually annotate the lung regions of 9000 frontal chest radiographs taken from ChestXray-14, CheXpert and a consolidated COVID-19 dataset. Further, 200 chest radiographs pertaining to COVID-19 patients are also annotated for semantic segmentation. This database will be released to the research community. △ Less

Submitted 3 August, 2020; originally announced August 2020.

arXiv:2005.13870 [pdf, ps, other]

Sums of Integral Squares In Certain Complex Bi-quadratic Fields

Authors: Srijonee Shabnam Chaudhury

Abstract: Let K be an algebraic number field and O_K be its ring of integers. Let S_K be the set of elements in O_K which are sums of squares in O_K and s(O_K) the minimal number of squares necessary to represent -1in O_K. Let g( S_K ) be the smallest positive integer t such that every element in S_K is a sum of t squares in O_K. Here K is generated over field of rational number by square root of m and -n ,… ▽ More Let K be an algebraic number field and O_K be its ring of integers. Let S_K be the set of elements in O_K which are sums of squares in O_K and s(O_K) the minimal number of squares necessary to represent -1in O_K. Let g( S_K ) be the smallest positive integer t such that every element in S_K is a sum of t squares in O_K. Here K is generated over field of rational number by square root of m and -n , where m congruent 3 mod 4 and n congruent 1 mod 4 are two distinct positive square free integers, we prove that $ S_K= O_K. We also prove that g(O_ K) less or equals to s(O_K)+1 or s(O_K)+2. Applying this, we shows that if s(O_K)=2, then g(O_K)=3. This work is continuation of a recent study initiated by Zhang and Ji . △ Less

Submitted 28 May, 2020; originally announced May 2020.

Comments: 10 pages

arXiv:2003.06646 [pdf, other]

Investigating Generalization in Neural Networks under Optimally Evolved Training Perturbations

Authors: Subhajit Chaudhury, Toshihiko Yamasaki

Abstract: In this paper, we study the generalization properties of neural networks under input perturbations and show that minimal training data corruption by a few pixel modifications can cause drastic overfitting. We propose an evolutionary algorithm to search for optimal pixel perturbations using novel cost function inspired from literature in domain adaptation that explicitly maximizes the generalizatio… ▽ More In this paper, we study the generalization properties of neural networks under input perturbations and show that minimal training data corruption by a few pixel modifications can cause drastic overfitting. We propose an evolutionary algorithm to search for optimal pixel perturbations using novel cost function inspired from literature in domain adaptation that explicitly maximizes the generalization gap and domain divergence between clean and corrupted images. Our method outperforms previous pixel-based data distribution shift methods on state-of-the-art Convolutional Neural Networks (CNNs) architectures. Interestingly, we find that the choice of optimization plays an important role in generalization robustness due to the empirical observation that SGD is resilient to such training data corruption unlike adaptive optimization techniques (ADAM). Our source code is available at https://github.com/subhajitchaudhury/evo-shift. △ Less

Submitted 14 March, 2020; originally announced March 2020.

Comments: Accepted at IEEE ICASSP 2020

arXiv:2002.08097 [pdf, other]

doi 10.1109/ISM46123.2019.00011

Unsupervised Temporal Feature Aggregation for Event Detection in Unstructured Sports Videos

Authors: Subhajit Chaudhury, Daiki Kimura, Phongtharin Vinayavekhin, Asim Munawar, Ryuki Tachibana, Koji Ito, Yuki Inaba, Minoru Matsumoto, Shuji Kidokoro, Hiroki Ozaki

Abstract: Image-based sports analytics enable automatic retrieval of key events in a game to speed up the analytics process for human experts. However, most existing methods focus on structured television broadcast video datasets with a straight and fixed camera having minimum variability in the capturing pose. In this paper, we study the case of event detection in sports videos for unstructured environment… ▽ More Image-based sports analytics enable automatic retrieval of key events in a game to speed up the analytics process for human experts. However, most existing methods focus on structured television broadcast video datasets with a straight and fixed camera having minimum variability in the capturing pose. In this paper, we study the case of event detection in sports videos for unstructured environments with arbitrary camera angles. The transition from structured to unstructured video analysis produces multiple challenges that we address in our paper. Specifically, we identify and solve two major problems: unsupervised identification of players in an unstructured setting and generalization of the trained models to pose variations due to arbitrary shooting angles. For the first problem, we propose a temporal feature aggregation algorithm using person re-identification features to obtain high player retrieval precision by boosting a weak heuristic scoring method. Additionally, we propose a data augmentation technique, based on multi-modal image translation model, to reduce bias in the appearance of training samples. Experimental evaluations show that our proposed method improves precision for player retrieval from 0.78 to 0.86 for obliquely angled videos. Additionally, we obtain an improvement in F1 score for rally detection in table tennis videos from 0.79 in case of global frame-level features to 0.89 using our proposed player-level features. Please see the supplementary video submission at https://ibm.biz/BdzeZA. △ Less

Submitted 19 February, 2020; originally announced February 2020.

Comments: Accepted to IEEE International Symposium on Multimedia, 2019

arXiv:2001.05878 [pdf, other]

Assessing Robustness of Deep learning Methods in Dermatological Workflow

Authors: Sourav Mishra, Subhajit Chaudhury, Hideaki Imaizumi, Toshihiko Yamasaki

Abstract: This paper aims to evaluate the suitability of current deep learning methods for clinical workflow especially by focusing on dermatology. Although deep learning methods have been attempted to get dermatologist level accuracy in several individual conditions, it has not been rigorously tested for common clinical complaints. Most projects involve data acquired in well-controlled laboratory condition… ▽ More This paper aims to evaluate the suitability of current deep learning methods for clinical workflow especially by focusing on dermatology. Although deep learning methods have been attempted to get dermatologist level accuracy in several individual conditions, it has not been rigorously tested for common clinical complaints. Most projects involve data acquired in well-controlled laboratory conditions. This may not reflect regular clinical evaluation where corresponding image quality is not always ideal. We test the robustness of deep learning methods by simulating non-ideal characteristics on user submitted images of ten classes of diseases. Assessing via imitated conditions, we have found the overall accuracy to drop and individual predictions change significantly in many cases despite of robust training. △ Less

Submitted 17 March, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

Comments: Accepted in ACM CHIL 2020 Workshop (Oral and poster, without publication)

arXiv:2001.03463 [pdf, other]

Compressive sensing based privacy for fall detection

Authors: Ronak Gupta, Prashant Anand, Santanu Chaudhury, Brejesh Lall, Sanjay Singh

Abstract: Fall detection holds immense importance in the field of healthcare, where timely detection allows for instant medical assistance. In this context, we propose a 3D ConvNet architecture which consists of 3D Inception modules for fall detection. The proposed architecture is a custom version of Inflated 3D (I3D) architecture, that takes compressed measurements of video sequence as spatio-temporal inpu… ▽ More Fall detection holds immense importance in the field of healthcare, where timely detection allows for instant medical assistance. In this context, we propose a 3D ConvNet architecture which consists of 3D Inception modules for fall detection. The proposed architecture is a custom version of Inflated 3D (I3D) architecture, that takes compressed measurements of video sequence as spatio-temporal input, obtained from compressive sensing framework, rather than video sequence as input, as in the case of I3D convolutional neural network. This is adopted since privacy raises a huge concern for patients being monitored through these RGB cameras. The proposed framework for fall detection is flexible enough with respect to a wide variety of measurement matrices. Ten action classes randomly selected from Kinetics-400 with no fall examples, are employed to train our 3D ConvNet post compressive sensing with different types of sensing matrices on the original video clips. Our results show that 3D ConvNet performance remains unchanged with different sensing matrices. Also, the performance obtained with Kinetics pre-trained 3D ConvNet on compressively sensed fall videos from benchmark datasets is better than the state-of-the-art techniques. △ Less

Submitted 10 January, 2020; originally announced January 2020.

Comments: accepted in NCVPRIPG 2019

arXiv:1912.07974 [pdf, ps, other]

Pulling a folded polymer through a nanopore

Authors: Bappa Ghosh, Jalal Sarabadani, Srabanti Chaudhury, Tapio Ala-Nissila

Abstract: We investigate the translocation dynamics of a folded linear polymer which is pulled through a nanopore by an external force. To this end, we generalize the iso-flux tension propagation (IFTP) theory for end-pulled polymer translocation to include the case of two segments of the folded polymer traversing simultaneously trough the pore. Our theory is extensively benchmarked with corresponding Molec… ▽ More We investigate the translocation dynamics of a folded linear polymer which is pulled through a nanopore by an external force. To this end, we generalize the iso-flux tension propagation (IFTP) theory for end-pulled polymer translocation to include the case of two segments of the folded polymer traversing simultaneously trough the pore. Our theory is extensively benchmarked with corresponding Molecular Dynamics (MD) simulations. The translocation process for a folded polymer can be divided into two main stages. In the first stage, both branches are traversing the pore and their dynamics is coupled. If the branches are not of equal length, there is a second stage where translocation of the shorter branch has been completed. Using the assumption of equal monomer flux of both branches, we analytically derive the equations of motion for both branches and characterise the translocation dynamics in detail from the average waiting time and its scaling form. Moreover, MD simulations are used to study additional details of translocation dynamics such as the translocation time distribution and individual monomer velocities. △ Less

Submitted 17 December, 2019; originally announced December 2019.

arXiv:1904.06683 [pdf]

Lunar surface image restoration using U-net based deep neural networks

Authors: Hiya Roy, Subhajit Chaudhury, Toshihiko Yamasaki, Danielle DeLatte, Makiko Ohtake, Tatsuaki Hashimoto

Abstract: Image restoration is a technique that reconstructs a feasible estimate of the original image from the noisy observation. In this paper, we present a U-Net based deep neural network model to restore the missing pixels on the lunar surface image in a context-aware fashion, which is often known as image inpainting problem. We use the grayscale image of the lunar surface captured by Multiband Imager (… ▽ More Image restoration is a technique that reconstructs a feasible estimate of the original image from the noisy observation. In this paper, we present a U-Net based deep neural network model to restore the missing pixels on the lunar surface image in a context-aware fashion, which is often known as image inpainting problem. We use the grayscale image of the lunar surface captured by Multiband Imager (MI) onboard Kaguya satellite for our experiments and the results show that our method can reconstruct the lunar surface image with good visual quality and improved PSNR values. △ Less

Submitted 14 April, 2019; originally announced April 2019.

arXiv:1904.01215 [pdf, other]

DSAL-GAN: Denoising based Saliency Prediction with Generative Adversarial Networks

Authors: Prerana Mukherjee, Manoj Sharma, Megh Makwana, Ajay Pratap Singh, Avinash Upadhyay, Akkshita Trivedi, Brejesh Lall, Santanu Chaudhury

Abstract: Synthesizing high quality saliency maps from noisy images is a challenging problem in computer vision and has many practical applications. Samples generated by existing techniques for saliency detection cannot handle the noise perturbations smoothly and fail to delineate the salient objects present in the given scene. In this paper, we present a novel end-to-end coupled Denoising based Saliency Pr… ▽ More Synthesizing high quality saliency maps from noisy images is a challenging problem in computer vision and has many practical applications. Samples generated by existing techniques for saliency detection cannot handle the noise perturbations smoothly and fail to delineate the salient objects present in the given scene. In this paper, we present a novel end-to-end coupled Denoising based Saliency Prediction with Generative Adversarial Network (DSAL-GAN) framework to address the problem of salient object detection in noisy images. DSAL-GAN consists of two generative adversarial-networks (GAN) trained end-to-end to perform denoising and saliency prediction altogether in a holistic manner. The first GAN consists of a generator which denoises the noisy input image, and in the discriminator counterpart we check whether the output is a denoised image or ground truth original image. The second GAN predicts the saliency maps from raw pixels of the input denoised image using a data-driven metric based on saliency prediction method with adversarial loss. Cycle consistency loss is also incorporated to further improve salient region prediction. We demonstrate with comprehensive evaluation that the proposed framework outperforms several baseline saliency models on various performance benchmarks. △ Less

Submitted 2 April, 2019; originally announced April 2019.

arXiv:1811.03692 [pdf, other]

Mode matching in GANs through latent space learning and inversion

Authors: Deepak Mishra, Prathosh A. P., Aravind Jayendran, Varun Srivastava, Santanu Chaudhury

Abstract: Generative adversarial networks (GANs) have shown remarkable success in generation of unstructured data, such as, natural images. However, discovery and separation of modes in the generated space, essential for several tasks beyond naive data generation, is still a challenge. In this paper, we address the problem of imposing desired modal properties on the generated space using a latent distributi… ▽ More Generative adversarial networks (GANs) have shown remarkable success in generation of unstructured data, such as, natural images. However, discovery and separation of modes in the generated space, essential for several tasks beyond naive data generation, is still a challenge. In this paper, we address the problem of imposing desired modal properties on the generated space using a latent distribution, engineered in accordance with the modal properties of the true data distribution. This is achieved by training a latent space inversion network in tandem with the generative network using a divergence loss. The latent space is made to follow a continuous multimodal distribution generated by reparameterization of a pair of continuous and discrete random variables. In addition, the modal priors of the latent distribution are learned to match with the true data distribution using minimal-supervision with negligible increment in number of learnable parameters. We validate our method on multiple tasks such as mode separation, conditional generation, and attribute discovery on multiple real world image datasets and demonstrate its efficacy over other state-of-the-art methods. △ Less

Submitted 24 March, 2019; v1 submitted 8 November, 2018; originally announced November 2018.

arXiv:1810.01108 [pdf, other]

Injective State-Image Mapping facilitates Visual Adversarial Imitation Learning

Authors: Subhajit Chaudhury, Daiki Kimura, Asim Munawar, Ryuki Tachibana

Abstract: The growing use of virtual autonomous agents in applications like games and entertainment demands better control policies for natural-looking movements and actions. Unlike the conventional approach of hard-coding motion routines, we propose a deep learning method for obtaining control policies by directly mimicking raw video demonstrations. Previous methods in this domain rely on extracting low-di… ▽ More The growing use of virtual autonomous agents in applications like games and entertainment demands better control policies for natural-looking movements and actions. Unlike the conventional approach of hard-coding motion routines, we propose a deep learning method for obtaining control policies by directly mimicking raw video demonstrations. Previous methods in this domain rely on extracting low-dimensional features from expert videos followed by a separate hand-crafted reward estimation step. We propose an imitation learning framework that reduces the dependence on hand-engineered reward functions by jointly learning the feature extraction and reward estimation steps using Generative Adversarial Networks (GANs). Our main contribution in this paper is to show that under injective mapping between low-level joint state (angles and velocities) trajectories and corresponding raw video stream, performing adversarial imitation learning on video demonstrations is equivalent to learning from the state trajectories. Experimental results show that the proposed adversarial learning method from raw videos produces a similar performance to state-of-the-art imitation learning techniques while frequently outperforming existing hand-crafted video imitation methods. Furthermore, we show that our method can learn action policies by imitating video demonstrations on YouTube with similar performance to learned agents from true reward signals. Please see the supplementary video submission at https://ibm.biz/BdzzNA. △ Less

Submitted 25 October, 2019; v1 submitted 2 October, 2018; originally announced October 2018.

Comments: Updated the paper to match with version accepted at IEEE MMSP 2019

arXiv:1809.08925 [pdf, other]

Constrained Exploration and Recovery from Experience Shaping

Authors: Tu-Hoa Pham, Giovanni De Magistris, Don Joven Agravante, Subhajit Chaudhury, Asim Munawar, Ryuki Tachibana

Abstract: We consider the problem of reinforcement learning under safety requirements, in which an agent is trained to complete a given task, typically formalized as the maximization of a reward signal over time, while concurrently avoiding undesirable actions or states, associated to lower rewards, or penalties. The construction and balancing of different reward components can be difficult in the presence… ▽ More We consider the problem of reinforcement learning under safety requirements, in which an agent is trained to complete a given task, typically formalized as the maximization of a reward signal over time, while concurrently avoiding undesirable actions or states, associated to lower rewards, or penalties. The construction and balancing of different reward components can be difficult in the presence of multiple objectives, yet is crucial for producing a satisfying policy. For example, in reaching a target while avoiding obstacles, low collision penalties can lead to reckless movements while high penalties can discourage exploration. To circumvent this limitation, we examine the effect of past actions in terms of safety to estimate which are acceptable or should be avoided in the future. We then actively reshape the action space of the agent during reinforcement learning, so that reward-driven exploration is constrained within safety limits. We propose an algorithm enabling the learning of such safety constraints in parallel with reinforcement learning and demonstrate its effectiveness in terms of both task completion and training time. △ Less

Submitted 21 September, 2018; originally announced September 2018.

Comments: Code: https://github.com/IBM/constrained-rl

arXiv:1807.01990 [pdf, other]

Transfer Learning From Synthetic To Real Images Using Variational Autoencoders For Precise Position Detection

Authors: Tadanobu Inoue, Subhajit Chaudhury, Giovanni De Magistris, Sakyasingha Dasgupta

Abstract: Capturing and labeling camera images in the real world is an expensive task, whereas synthesizing labeled images in a simulation environment is easy for collecting large-scale image data. However, learning from only synthetic images may not achieve the desired performance in the real world due to a gap between synthetic and real images. We propose a method that transfers learned detection of an ob… ▽ More Capturing and labeling camera images in the real world is an expensive task, whereas synthesizing labeled images in a simulation environment is easy for collecting large-scale image data. However, learning from only synthetic images may not achieve the desired performance in the real world due to a gap between synthetic and real images. We propose a method that transfers learned detection of an object position from a simulation environment to the real world. This method uses only a significantly limited dataset of real images while leveraging a large dataset of synthetic images using variational autoencoders. Additionally, the proposed method consistently performed well in different lighting conditions, in the presence of other distractor objects, and on different backgrounds. Experimental results showed that it achieved accuracy of 1.5mm to 3.5mm on average. Furthermore, we showed how the method can be used in a real-world scenario like a "pick-and-place" robotic task. △ Less

Submitted 4 July, 2018; originally announced July 2018.

arXiv:1806.08523 [pdf, ps, other]

Focusing on What is Relevant: Time-Series Learning and Understanding using Attention

Authors: Phongtharin Vinayavekhin, Subhajit Chaudhury, Asim Munawar, Don Joven Agravante, Giovanni De Magistris, Daiki Kimura, Ryuki Tachibana

Abstract: This paper is a contribution towards interpretability of the deep learning models in different applications of time-series. We propose a temporal attention layer that is capable of selecting the relevant information to perform various tasks, including data completion, key-frame detection and classification. The method uses the whole input sequence to calculate an attention value for each time step… ▽ More This paper is a contribution towards interpretability of the deep learning models in different applications of time-series. We propose a temporal attention layer that is capable of selecting the relevant information to perform various tasks, including data completion, key-frame detection and classification. The method uses the whole input sequence to calculate an attention value for each time step. This results in more focused attention values and more plausible visualisation than previous methods. We apply the proposed method to three different tasks. Experimental results show that the proposed network produces comparable results to a state of the art. In addition, the network provides better interpretability of the decision, that is, it generates more significant attention weight to related frames compared to similar techniques attempted in the past. △ Less

Submitted 22 June, 2018; originally announced June 2018.

Comments: To appear in ICPR 2018

arXiv:1806.01267 [pdf, other]

Internal Model from Observations for Reward Shaping

Authors: Daiki Kimura, Subhajit Chaudhury, Ryuki Tachibana, Sakyasingha Dasgupta

Abstract: Reinforcement learning methods require careful design involving a reward function to obtain the desired action policy for a given task. In the absence of hand-crafted reward functions, prior work on the topic has proposed several methods for reward estimation by using expert state trajectories and action pairs. However, there are cases where complete or good action information cannot be obtained f… ▽ More Reinforcement learning methods require careful design involving a reward function to obtain the desired action policy for a given task. In the absence of hand-crafted reward functions, prior work on the topic has proposed several methods for reward estimation by using expert state trajectories and action pairs. However, there are cases where complete or good action information cannot be obtained from expert demonstrations. We propose a novel reinforcement learning method in which the agent learns an internal model of observation on the basis of expert-demonstrated state trajectories to estimate rewards without completely learning the dynamics of the external environment from state-action pairs. The internal model is obtained in the form of a predictive model for the given expert state distribution. During reinforcement learning, the agent predicts the reward as a function of the difference between the actual state and the state predicted by the internal model. We conducted multiple experiments in environments of varying complexity, including the Super Mario Bros and Flappy Bird games. We show our method successfully trains good policies directly from expert game-play videos. △ Less

Submitted 14 October, 2018; v1 submitted 2 June, 2018; originally announced June 2018.

Comments: 7 pages, 6 figures, ICML workshop (ALA 2018)

arXiv:1805.00223 [pdf, other]

Localization: A Missing Link in the Pipeline of Object Matching and Registration

Authors: Deepak Mishra, Rajeev Ranjan, Santanu Chaudhury, Mukul Sarkar, Arvinder Singh Soin

Abstract: Image registration is a process of aligning two or more images of same objects using geometric transformation. Most of the existing approaches work on the assumption of location invariance. These approaches require object-centric images to perform matching. Further, in absence of intensity level symmetry between the corresponding points in two images, the learning based registration approaches rel… ▽ More Image registration is a process of aligning two or more images of same objects using geometric transformation. Most of the existing approaches work on the assumption of location invariance. These approaches require object-centric images to perform matching. Further, in absence of intensity level symmetry between the corresponding points in two images, the learning based registration approaches rely on synthetic deformations, which often fail in real scenarios. To address these issues, a combination of convolutional neural networks (CNNs) to perform the desired registration is developed in this work. The complete objective is divided into three sub-objectives: object localization, segmentation and matching transformation. Object localization step establishes an initial correspondence between the images. A modified version of single shot multi-box detector is used for this purpose. The detected region is cropped to make the images object-centric. Subsequently, the objects are segmented and matched using a spatial transformer network employing thin plate spline deformation. Initial experiments on MNIST and Caltech-101 datasets show that the proposed model is able to produce accurate matching. Quantitative evaluation performed using dice coefficient (DC) and mean intersection over union (mIoU) show that proposed method results in the values of 79% and 66%, respectively for MNIST dataset and the values of 94% and 90%, respectively for Caltech-101 dataset. The proposed framework is extended to the registration of CT and US images, which is free from any data specific assumptions and has better generalization capability as compared to the existing rule-based/classical approaches. △ Less

Submitted 11 January, 2019; v1 submitted 1 May, 2018; originally announced May 2018.

Comments: 11 pages, 6 figures

Showing 1–50 of 73 results for author: Chaudhury, S