-
Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning
Authors:
Mainak Singha,
Ankit Jha,
Divyam Gupta,
Pranav Singla,
Biplab Banerjee
Abstract:
We address the challenges inherent in sketch-based image retrieval (SBIR) across various settings, including zero-shot SBIR, generalized zero-shot SBIR, and fine-grained zero-shot SBIR, by leveraging the vision-language foundation model, CLIP. While recent endeavors have employed CLIP to enhance SBIR, these approaches predominantly follow uni-modal prompt processing and overlook to fully exploit C…
▽ More
We address the challenges inherent in sketch-based image retrieval (SBIR) across various settings, including zero-shot SBIR, generalized zero-shot SBIR, and fine-grained zero-shot SBIR, by leveraging the vision-language foundation model, CLIP. While recent endeavors have employed CLIP to enhance SBIR, these approaches predominantly follow uni-modal prompt processing and overlook to fully exploit CLIP's integrated visual and textual capabilities. To bridge this gap, we introduce SpLIP, a novel multi-modal prompt learning scheme designed to operate effectively with frozen CLIP backbones. We diverge from existing multi-modal prompting methods that either treat visual and textual prompts independently or integrate them in a limited fashion, leading to suboptimal generalization. SpLIP implements a bi-directional prompt-sharing strategy that enables mutual knowledge exchange between CLIP's visual and textual encoders, fostering a more cohesive and synergistic prompt processing mechanism that significantly reduces the semantic gap between the sketch and photo embeddings. In addition to pioneering multi-modal prompt learning, we propose two innovative strategies for further refining the embedding space. The first is an adaptive margin generation for the sketch-photo triplet loss, regulated by CLIP's class textual embeddings. The second introduces a novel task, termed conditional cross-modal jigsaw, aimed at enhancing fine-grained sketch-photo alignment, by focusing on implicitly modelling the viable patch arrangement of sketches using knowledge of unshuffled photos. Our comprehensive experimental evaluations across multiple benchmarks demonstrate the superior performance of SpLIP in all three SBIR scenarios. Code is available at https://github.com/mainaksingha01/SpLIP.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Present and Future of AI in Renewable Energy Domain : A Comprehensive Survey
Authors:
Abdur Rashid,
Parag Biswas,
Angona Biswas,
MD Abdullah Al Nasim,
Kishor Datta Gupta,
Roy George
Abstract:
Artificial intelligence (AI) has become a crucial instrument for streamlining processes in various industries, including electrical power systems, as a result of recent digitalization. Algorithms for artificial intelligence are data-driven models that are based on statistical learning theory and are used as a tool to take use of the data that the power system and its users generate. Initially, we…
▽ More
Artificial intelligence (AI) has become a crucial instrument for streamlining processes in various industries, including electrical power systems, as a result of recent digitalization. Algorithms for artificial intelligence are data-driven models that are based on statistical learning theory and are used as a tool to take use of the data that the power system and its users generate. Initially, we perform a thorough literature analysis of artificial intelligence (AI) applications related to renewable energy (RE). Next, we present a thorough analysis of renewable energy factories and assess their suitability, along with a list of the most widely used and appropriate AI algorithms. Nine AI-based strategies are identified here to assist Renewable Energy (RE) in contemporary power systems. This survey paper comprises an extensive review of the several AI techniques used for renewable energy as well as a methodical analysis of the literature for the study of various intelligent system application domains across different disciplines of renewable energy. This literature review identifies the performance and outcomes of nine different research methods by assessing them, and it aims to distill valuable insights into their strengths and limitations. This study also addressed three main topics: using AI technology for renewable power generation, utilizing AI for renewable energy forecasting, and optimizing energy systems. Additionally, it explored AI's superiority over conventional models in controllability, data handling, cyberattack prevention, smart grid implementation, robotics- AI's significance in shaping the future of the energy industry. Furthermore, this article outlines future directions in the integration of AI for renewable energy.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
AI-Driven Approaches for Optimizing Power Consumption: A Comprehensive Survey
Authors:
Parag Biswas,
Abdur Rashid,
Angona Biswas,
Md Abdullah Al Nasim,
Kishor Datta Gupta,
Roy George
Abstract:
Reduced environmental effect, lower operating costs, and a stable and sustainable energy supply for current and future generations are the main reasons why power optimization is important. Power optimization makes ensuring that energy is used more effectively, cutting down on waste and optimizing the utilization of resources.In today's world, power optimization and artificial intelligence (AI) int…
▽ More
Reduced environmental effect, lower operating costs, and a stable and sustainable energy supply for current and future generations are the main reasons why power optimization is important. Power optimization makes ensuring that energy is used more effectively, cutting down on waste and optimizing the utilization of resources.In today's world, power optimization and artificial intelligence (AI) integration are essential to changing the way energy is produced, used, and distributed. Real-time monitoring and analysis of power usage trends is made possible by AI-driven algorithms and predictive analytics, which enable dynamic modifications to effectively satisfy demand. Efficiency and sustainability are increased when power consumption is optimized in different sectors thanks to the use of intelligent systems. This survey paper comprises an extensive review of the several AI techniques used for power optimization as well as a methodical analysis of the literature for the study of various intelligent system application domains across different disciplines of power consumption.This literature review identifies the performance and outcomes of 17 different research methods by assessing them, and it aims to distill valuable insights into their strengths and limitations. Furthermore, this article outlines future directions in the integration of AI for power consumption optimization.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Python-based DSL for generating Verilog model of Synchronous Digital Circuits
Authors:
Mandar Datar,
Dhruva S. Hegde,
Vendra Durga Prasad,
Manish Prajapati,
Neralla Manikanta,
Devansh Gupta,
Janampalli Pavanija,
Pratyush Pare,
Akash,
Shivam Gupta,
Sachin B. Patkar
Abstract:
We have designed a Python-based Domain Specific Language (DSL) for modeling synchronous digital circuits. In this DSL, hardware is modeled as a collection of transactions -- running in series, parallel, and loops. When the model is executed by a Python interpreter, synthesizable and behavioural Verilog is generated as output, which can be integrated with other RTL designs or directly used for FPGA…
▽ More
We have designed a Python-based Domain Specific Language (DSL) for modeling synchronous digital circuits. In this DSL, hardware is modeled as a collection of transactions -- running in series, parallel, and loops. When the model is executed by a Python interpreter, synthesizable and behavioural Verilog is generated as output, which can be integrated with other RTL designs or directly used for FPGA and ASIC flows. In this paper, we describe - 1) the language (DSL), which allows users to express computation in series/parallel/loop constructs, with explicit cycle boundaries, 2) the internals of a simple Python implementation to produce synthesizable Verilog, and 3) several design examples and case studies for applications in post-quantum cryptography, stereo-vision, digital signal processing and optimization techniques. In the end, we list ideas to extend this framework.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
SimSAM: Simple Siamese Representations Based Semantic Affinity Matrix for Unsupervised Image Segmentation
Authors:
Chanda Grover Kamra,
Indra Deep Mastan,
Nitin Kumar,
Debayan Gupta
Abstract:
Recent developments in self-supervised learning (SSL) have made it possible to learn data representations without the need for annotations. Inspired by the non-contrastive SSL approach (SimSiam), we introduce a novel framework SIMSAM to compute the Semantic Affinity Matrix, which is significant for unsupervised image segmentation. Given an image, SIMSAM first extracts features using pre-trained DI…
▽ More
Recent developments in self-supervised learning (SSL) have made it possible to learn data representations without the need for annotations. Inspired by the non-contrastive SSL approach (SimSiam), we introduce a novel framework SIMSAM to compute the Semantic Affinity Matrix, which is significant for unsupervised image segmentation. Given an image, SIMSAM first extracts features using pre-trained DINO-ViT, then projects the features to predict the correlations of dense features in a non-contrastive way. We show applications of the Semantic Affinity Matrix in object segmentation and semantic segmentation tasks. Our code is available at https://github.com/chandagrover/SimSAM.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
ICU-Sepsis: A Benchmark MDP Built from Real Medical Data
Authors:
Kartik Choudhary,
Dhawal Gupta,
Philip S. Thomas
Abstract:
We present ICU-Sepsis, an environment that can be used in benchmarks for evaluating reinforcement learning (RL) algorithms. Sepsis management is a complex task that has been an important topic in applied RL research in recent years. Therefore, MDPs that model sepsis management can serve as part of a benchmark to evaluate RL algorithms on a challenging real-world problem. However, creating usable M…
▽ More
We present ICU-Sepsis, an environment that can be used in benchmarks for evaluating reinforcement learning (RL) algorithms. Sepsis management is a complex task that has been an important topic in applied RL research in recent years. Therefore, MDPs that model sepsis management can serve as part of a benchmark to evaluate RL algorithms on a challenging real-world problem. However, creating usable MDPs that simulate sepsis care in the ICU remains a challenge due to the complexities involved in acquiring and processing patient data. ICU-Sepsis is a lightweight environment that models personalized care of sepsis patients in the ICU. The environment is a tabular MDP that is widely compatible and is challenging even for state-of-the-art RL algorithms, making it a valuable tool for benchmarking their performance. However, we emphasize that while ICU-Sepsis provides a standardized environment for evaluating RL algorithms, it should not be used to draw conclusions that guide medical practice.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
The 2024 Brain Tumor Segmentation (BraTS) Challenge: Glioma Segmentation on Post-treatment MRI
Authors:
Maria Correia de Verdier,
Rachit Saluja,
Louis Gagnon,
Dominic LaBella,
Ujjwall Baid,
Nourel Hoda Tahon,
Martha Foltyn-Dumitru,
Jikai Zhang,
Maram Alafif,
Saif Baig,
Ken Chang,
Gennaro D'Anna,
Lisa Deptula,
Diviya Gupta,
Muhammad Ammar Haider,
Ali Hussain,
Michael Iv,
Marinos Kontzialis,
Paul Manning,
Farzan Moodi,
Teresa Nunes,
Aaron Simon,
Nico Sollmann,
David Vu,
Maruf Adewole
, et al. (60 additional authors not shown)
Abstract:
Gliomas are the most common malignant primary brain tumors in adults and one of the deadliest types of cancer. There are many challenges in treatment and monitoring due to the genetic diversity and high intrinsic heterogeneity in appearance, shape, histology, and treatment response. Treatments include surgery, radiation, and systemic therapies, with magnetic resonance imaging (MRI) playing a key r…
▽ More
Gliomas are the most common malignant primary brain tumors in adults and one of the deadliest types of cancer. There are many challenges in treatment and monitoring due to the genetic diversity and high intrinsic heterogeneity in appearance, shape, histology, and treatment response. Treatments include surgery, radiation, and systemic therapies, with magnetic resonance imaging (MRI) playing a key role in treatment planning and post-treatment longitudinal assessment. The 2024 Brain Tumor Segmentation (BraTS) challenge on post-treatment glioma MRI will provide a community standard and benchmark for state-of-the-art automated segmentation models based on the largest expert-annotated post-treatment glioma MRI dataset. Challenge competitors will develop automated segmentation models to predict four distinct tumor sub-regions consisting of enhancing tissue (ET), surrounding non-enhancing T2/fluid-attenuated inversion recovery (FLAIR) hyperintensity (SNFH), non-enhancing tumor core (NETC), and resection cavity (RC). Models will be evaluated on separate validation and test datasets using standardized performance metrics utilized across the BraTS 2024 cluster of challenges, including lesion-wise Dice Similarity Coefficient and Hausdorff Distance. Models developed during this challenge will advance the field of automated MRI segmentation and contribute to their integration into clinical practice, ultimately enhancing patient care.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
UAV-assisted C-RAN for On-demand Cellular Coverage: Opportunities and Challenges
Authors:
Byomakesh Mahapatra,
Deepika Gupta,
Pankaj Kumar Sharma
Abstract:
The deployment of beyond fifth-generation (5G) infrastructure over disaster-affected regions, temporary hotspot situations (e.g., massive gatherings, etc.), complex terrains (e.g., sea, hills, marshes, etc.) poses numerous challenges for cellular service providers. Recently, unmanned aerial vehicles (UAVs) have emerged as potential candidates to overcome the aforementioned technical issues based o…
▽ More
The deployment of beyond fifth-generation (5G) infrastructure over disaster-affected regions, temporary hotspot situations (e.g., massive gatherings, etc.), complex terrains (e.g., sea, hills, marshes, etc.) poses numerous challenges for cellular service providers. Recently, unmanned aerial vehicles (UAVs) have emerged as potential candidates to overcome the aforementioned technical issues based on their multi-role capabilities to serve as aerial base stations, mobile relays, and flying wireless access points. As such, the UAVs can act as portable platforms that can be deployed immediately on demand without requiring massive ground infrastructure to support wireless services. This article introduces the integration of UAVs to cloud radio access networks (C-RAN) for beyond 5G applications. The article mainly focuses on the underlying opportunities and challenges to realize the UAV-assisted C-RAN (UC-RAN) architecture in view of three generic application scenarios, i.e., disaster management, hotspots, and complex terrains. A preliminary performance analysis via simulation is further provided for the proposed UC-RAN under hotspot application scenario based on the relevant metrics.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Surgical Feature-Space Decomposition of LLMs: Why, When and How?
Authors:
Arnav Chavan,
Nahush Lele,
Deepak Gupta
Abstract:
Low-rank approximations, of the weight and feature space can enhance the performance of deep learning models, whether in terms of improving generalization or reducing the latency of inference. However, there is no clear consensus yet on \emph{how}, \emph{when} and \emph{why} these approximations are helpful for large language models (LLMs). In this work, we empirically study the efficacy of weight…
▽ More
Low-rank approximations, of the weight and feature space can enhance the performance of deep learning models, whether in terms of improving generalization or reducing the latency of inference. However, there is no clear consensus yet on \emph{how}, \emph{when} and \emph{why} these approximations are helpful for large language models (LLMs). In this work, we empirically study the efficacy of weight and feature space decomposition in transformer-based LLMs. We demonstrate that surgical decomposition not only provides critical insights into the trade-off between compression and language modelling performance, but also sometimes enhances commonsense reasoning performance of LLMs. Our empirical analysis identifies specific network segments that intrinsically exhibit a low-rank structure. Furthermore, we extend our investigation to the implications of low-rank approximations on model bias. Overall, our findings offer a novel perspective on optimizing LLMs, presenting the low-rank approximation not only as a tool for performance enhancements, but also as a means to potentially rectify biases within these models. Our code is available at \href{https://github.com/nyunAI/SFSD-LLM}{GitHub}.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Low-Rank Adaptation of Time Series Foundational Models for Out-of-Domain Modality Forecasting
Authors:
Divij Gupta,
Anubhav Bhatti,
Suraj Parmar,
Chen Dan,
Yuwei Liu,
Bingjie Shen,
San Lee
Abstract:
Low-Rank Adaptation (LoRA) is a widely used technique for fine-tuning large pre-trained or foundational models across different modalities and tasks. However, its application to time series data, particularly within foundational models, remains underexplored. This paper examines the impact of LoRA on contemporary time series foundational models: Lag-Llama, MOIRAI, and Chronos. We demonstrate LoRA'…
▽ More
Low-Rank Adaptation (LoRA) is a widely used technique for fine-tuning large pre-trained or foundational models across different modalities and tasks. However, its application to time series data, particularly within foundational models, remains underexplored. This paper examines the impact of LoRA on contemporary time series foundational models: Lag-Llama, MOIRAI, and Chronos. We demonstrate LoRA's fine-tuning potential for forecasting the vital signs of sepsis patients in intensive care units (ICUs), emphasizing the models' adaptability to previously unseen, out-of-domain modalities. Integrating LoRA aims to enhance forecasting performance while reducing inefficiencies associated with fine-tuning large models on limited domain-specific data. Our experiments show that LoRA fine-tuning of time series foundational models significantly improves forecasting, achieving results comparable to state-of-the-art models trained from scratch on similar modalities. We conduct comprehensive ablation studies to demonstrate the trade-offs between the number of tunable parameters and forecasting performance and assess the impact of varying LoRA matrix ranks on model performance.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Learning label-label correlations in Extreme Multi-label Classification via Label Features
Authors:
Siddhant Kharbanda,
Devaansh Gupta,
Erik Schultheis,
Atmadeep Banerjee,
Cho-Jui Hsieh,
Rohit Babbar
Abstract:
Extreme Multi-label Text Classification (XMC) involves learning a classifier that can assign an input with a subset of most relevant labels from millions of label choices. Recent works in this domain have increasingly focused on a symmetric problem setting where both input instances and label features are short-text in nature. Short-text XMC with label features has found numerous applications in a…
▽ More
Extreme Multi-label Text Classification (XMC) involves learning a classifier that can assign an input with a subset of most relevant labels from millions of label choices. Recent works in this domain have increasingly focused on a symmetric problem setting where both input instances and label features are short-text in nature. Short-text XMC with label features has found numerous applications in areas such as query-to-ad-phrase matching in search ads, title-based product recommendation, prediction of related searches. In this paper, we propose Gandalf, a novel approach which makes use of a label co-occurrence graph to leverage label features as additional data points to supplement the training distribution. By exploiting the characteristics of the short-text XMC problem, it leverages the label features to construct valid training instances, and uses the label graph for generating the corresponding soft-label targets, hence effectively capturing the label-label correlations. Surprisingly, models trained on these new training instances, although being less than half of the original dataset, can outperform models trained on the original dataset, particularly on the PSP@k metric for tail labels. With this insight, we aim to train existing XMC algorithms on both, the original and new training instances, leading to an average 5% relative improvements for 6 state-of-the-art algorithms across 4 benchmark datasets consisting of up to 1.3M labels. Gandalf can be applied in a plug-and-play manner to various methods and thus forwards the state-of-the-art in the domain, without incurring any additional computational overheads.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification
Authors:
Siddhant Kharbanda,
Devaansh Gupta,
Gururaj K,
Pankaj Malhotra,
Cho-Jui Hsieh,
Rohit Babbar
Abstract:
Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space, given an input query and labels with textual features. Models developed for this problem have conventionally used modular approach with (i) a Dual Encoder (DE) to embed the queries and label texts, (ii) a One-vs-All classifier to rerank the shortlisted labels mined through…
▽ More
Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space, given an input query and labels with textual features. Models developed for this problem have conventionally used modular approach with (i) a Dual Encoder (DE) to embed the queries and label texts, (ii) a One-vs-All classifier to rerank the shortlisted labels mined through meta-classifier training. While such methods have shown empirical success, we observe two key uncharted aspects, (i) DE training typically uses only a single positive relation even for datasets which offer more, (ii) existing approaches fixate on using only OvA reduction of the multi-label problem. This work aims to explore these aspects by proposing UniDEC, a novel end-to-end trainable framework which trains the dual encoder and classifier in together in a unified fashion using a multi-class loss. For the choice of multi-class loss, the work proposes a novel pick-some-label (PSL) reduction of the multi-label problem with leverages multiple (in come cases, all) positives. The proposed framework achieves state-of-the-art results on a single GPU, while achieving on par results with respect to multi-GPU SOTA methods on various XML benchmark datasets, all while using 4-16x lesser compute and being practically scalable even beyond million label scale datasets.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
Interpretable Vital Sign Forecasting with Model Agnostic Attention Maps
Authors:
Yuwei Liu,
Chen Dan,
Anubhav Bhatti,
Bingjie Shen,
Divij Gupta,
Suraj Parmar,
San Lee
Abstract:
Sepsis is a leading cause of mortality in intensive care units (ICUs), representing a substantial medical challenge. The complexity of analyzing diverse vital signs to predict sepsis further aggravates this issue. While deep learning techniques have been advanced for early sepsis prediction, their 'black-box' nature obscures the internal logic, impairing interpretability in critical settings like…
▽ More
Sepsis is a leading cause of mortality in intensive care units (ICUs), representing a substantial medical challenge. The complexity of analyzing diverse vital signs to predict sepsis further aggravates this issue. While deep learning techniques have been advanced for early sepsis prediction, their 'black-box' nature obscures the internal logic, impairing interpretability in critical settings like ICUs. This paper introduces a framework that combines a deep learning model with an attention mechanism that highlights the critical time steps in the forecasting process, thus improving model interpretability and supporting clinical decision-making. We show that the attention mechanism could be adapted to various black box time series forecasting models such as N-HiTS and N-BEATS. Our method preserves the accuracy of conventional deep learning models while enhancing interpretability through attention-weight-generated heatmaps. We evaluated our model on the eICU-CRD dataset, focusing on forecasting vital signs for sepsis patients. We assessed its performance using mean squared error (MSE) and dynamic time warping (DTW) metrics. We explored the attention maps of N-HiTS and N-BEATS, examining the differences in their performance and identifying crucial factors influencing vital sign forecasting.
△ Less
Submitted 21 May, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Self-healing Nodes with Adaptive Data-Sharding
Authors:
Ayush Thakur,
Sanskar Chauhan,
Ilisha Tomar,
Vaibhavi Paul,
Deepak Gupta
Abstract:
Data sharding, a technique for partitioning and distributing data among multiple servers or nodes, offers enhancements in the scalability, performance, and fault tolerance of extensive distributed systems. Nonetheless, this strategy introduces novel challenges, including load balancing among shards, management of node failures and data loss, and adaptation to evolving data and workload patterns. T…
▽ More
Data sharding, a technique for partitioning and distributing data among multiple servers or nodes, offers enhancements in the scalability, performance, and fault tolerance of extensive distributed systems. Nonetheless, this strategy introduces novel challenges, including load balancing among shards, management of node failures and data loss, and adaptation to evolving data and workload patterns. This paper proposes an innovative approach to tackle these challenges by empowering self-healing nodes with adaptive data sharding. Leveraging concepts such as self-replication, fractal regeneration, sentient data sharding, and symbiotic node clusters, our approach establishes a dynamic and resilient data sharding scheme capable of addressing diverse scenarios and meeting varied requirements. Implementation and evaluation of our approach involve a prototype system simulating a large-scale distributed database across various data sharding scenarios. Comparative analyses against existing data sharding techniques highlight the superior scalability, performance, fault tolerance, and adaptability of our approach. Additionally, the paper delves into potential applications and limitations, providing insights into the future research directions that can further advance this innovative approach.
△ Less
Submitted 19 January, 2024;
originally announced May 2024.
-
PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance Verification
Authors:
Leon Garza,
Lavanya Elluri,
Anantaa Kotal,
Aritran Piplai,
Deepti Gupta,
Anupam Joshi
Abstract:
Data protection and privacy is becoming increasingly crucial in the digital era. Numerous companies depend on third-party vendors and service providers to carry out critical functions within their operations, encompassing tasks such as data handling and storage. However, this reliance introduces potential vulnerabilities, as these vendors' security measures and practices may not always align with…
▽ More
Data protection and privacy is becoming increasingly crucial in the digital era. Numerous companies depend on third-party vendors and service providers to carry out critical functions within their operations, encompassing tasks such as data handling and storage. However, this reliance introduces potential vulnerabilities, as these vendors' security measures and practices may not always align with the standards expected by regulatory bodies. Businesses are required, often under the penalty of law, to ensure compliance with the evolving regulatory rules. Interpreting and implementing these regulations pose challenges due to their complexity. Regulatory documents are extensive, demanding significant effort for interpretation, while vendor-drafted privacy policies often lack the detail required for full legal compliance, leading to ambiguity. To ensure a concise interpretation of the regulatory requirements and compliance of organizational privacy policy with said regulations, we propose a Large Language Model (LLM) and Semantic Web based approach for privacy compliance. In this paper, we develop the novel Privacy Policy Compliance Verification Knowledge Graph, PrivComp-KG. It is designed to efficiently store and retrieve comprehensive information concerning privacy policies, regulatory frameworks, and domain-specific knowledge pertaining to the legal landscape of privacy. Using Retrieval Augmented Generation, we identify the relevant sections in a privacy policy with corresponding regulatory rules. This information about individual privacy policies is populated into the PrivComp-KG. Combining this with the domain context and rules, the PrivComp-KG can be queried to check for compliance with privacy policies by each vendor against relevant policy regulations. We demonstrate the relevance of the PrivComp-KG, by verifying compliance of privacy policy documents for various organizations.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models
Authors:
Ying-Chun Lin,
Jennifer Neville,
Jack W. Stokes,
Longqi Yang,
Tara Safavi,
Mengting Wan,
Scott Counts,
Siddharth Suri,
Reid Andersen,
Xiaofeng Xu,
Deepak Gupta,
Sujay Kumar Jauhar,
Xia Song,
Georg Buscher,
Saurabh Tiwary,
Brent Hecht,
Jaime Teevan
Abstract:
Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featur…
▽ More
Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featurized ML models or text embeddings fall short in extracting generalizable patterns and are hard to interpret. In this work, we show that LLMs can extract interpretable signals of user satisfaction from their natural language utterances more effectively than embedding-based approaches. Moreover, an LLM can be tailored for USE via an iterative prompting framework using supervision from labeled examples. The resulting method, Supervised Prompting for User satisfaction Rubrics (SPUR), not only has higher accuracy but is more interpretable as it scores user satisfaction via learned rubrics with a detailed breakdown.
△ Less
Submitted 8 June, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs
Authors:
Tanmay Rajore,
Nishanth Chandran,
Sunayana Sitaram,
Divya Gupta,
Rahul Sharma,
Kashish Mittal,
Manohar Swaminathan
Abstract:
Benchmarking is the de-facto standard for evaluating LLMs, due to its speed, replicability and low cost. However, recent work has pointed out that the majority of the open source benchmarks available today have been contaminated or leaked into LLMs, meaning that LLMs have access to test data during pretraining and/or fine-tuning. This raises serious concerns about the validity of benchmarking stud…
▽ More
Benchmarking is the de-facto standard for evaluating LLMs, due to its speed, replicability and low cost. However, recent work has pointed out that the majority of the open source benchmarks available today have been contaminated or leaked into LLMs, meaning that LLMs have access to test data during pretraining and/or fine-tuning. This raises serious concerns about the validity of benchmarking studies conducted so far and the future of evaluation using benchmarks. To solve this problem, we propose Private Benchmarking, a solution where test datasets are kept private and models are evaluated without revealing the test data to the model. We describe various scenarios (depending on the trust placed on model owners or dataset owners), and present solutions to avoid data contamination using private benchmarking. For scenarios where the model weights need to be kept private, we describe solutions from confidential computing and cryptography that can aid in private benchmarking. We build an end-to-end system, TRUCE, that enables such private benchmarking showing that the overheads introduced to protect models and benchmark are negligible (in the case of confidential computing) and tractable (when cryptographic security is required). Finally, we also discuss solutions to the problem of benchmark dataset auditing, to ensure that private benchmarks are of sufficiently high quality.
△ Less
Submitted 24 June, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
TrustRate: A Decentralized Platform for Hijack-Resistant Anonymous Reviews
Authors:
Rohit Dwivedula,
Sriram Sridhar,
Sambhav Satija,
Muthian Sivathanu,
Nishanth Chandran,
Divya Gupta,
Satya Lokam
Abstract:
Reviews and ratings by users form a central component in several widely used products today (e.g., product reviews, ratings of online content, etc.), but today's platforms for managing such reviews are ad-hoc and vulnerable to various forms of tampering and hijack by fake reviews either by bots or motivated paid workers. We define a new metric called 'hijack-resistance' for such review platforms,…
▽ More
Reviews and ratings by users form a central component in several widely used products today (e.g., product reviews, ratings of online content, etc.), but today's platforms for managing such reviews are ad-hoc and vulnerable to various forms of tampering and hijack by fake reviews either by bots or motivated paid workers. We define a new metric called 'hijack-resistance' for such review platforms, and then present TrustRate, an end-to-end decentralized, hijack-resistant platform for authentic, anonymous, tamper-proof reviews. With a prototype implementation and evaluation at the scale of thousands of nodes, we demonstrate the efficacy and performance of our platform, towards a new paradigm for building products based on trusted reviews by end users without having to trust a single organization that manages the reviews.
△ Less
Submitted 23 May, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Beyond Uniform Scaling: Exploring Depth Heterogeneity in Neural Architectures
Authors:
Akash Guna R. T,
Arnav Chavan,
Deepak Gupta
Abstract:
Conventional scaling of neural networks typically involves designing a base network and growing different dimensions like width, depth, etc. of the same by some predefined scaling factors. We introduce an automated scaling approach leveraging second-order loss landscape information. Our method is flexible towards skip connections a mainstay in modern vision transformers. Our training-aware method…
▽ More
Conventional scaling of neural networks typically involves designing a base network and growing different dimensions like width, depth, etc. of the same by some predefined scaling factors. We introduce an automated scaling approach leveraging second-order loss landscape information. Our method is flexible towards skip connections a mainstay in modern vision transformers. Our training-aware method jointly scales and trains transformers without additional training iterations. Motivated by the hypothesis that not all neurons need uniform depth complexity, our approach embraces depth heterogeneity. Extensive evaluations on DeiT-S with ImageNet100 show a 2.5% accuracy gain and 10% parameter efficiency improvement over conventional scaling. Scaled networks demonstrate superior performance upon training small scale datasets from scratch. We introduce the first intact scaling mechanism for vision transformers, a step towards efficient model scaling.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward
Authors:
Arnav Chavan,
Raghav Magazine,
Shubham Kushwaha,
Mérouane Debbah,
Deepak Gupta
Abstract:
Despite the impressive performance of LLMs, their widespread adoption faces challenges due to substantial computational and memory requirements during inference. Recent advancements in model compression and system-level optimization methods aim to enhance LLM inference. This survey offers an overview of these methods, emphasizing recent developments. Through experiments on LLaMA(/2)-7B, we evaluat…
▽ More
Despite the impressive performance of LLMs, their widespread adoption faces challenges due to substantial computational and memory requirements during inference. Recent advancements in model compression and system-level optimization methods aim to enhance LLM inference. This survey offers an overview of these methods, emphasizing recent developments. Through experiments on LLaMA(/2)-7B, we evaluate various compression techniques, providing practical insights for efficient LLM deployment in a unified setting. The empirical analysis on LLaMA(/2)-7B highlights the effectiveness of these methods. Drawing from survey insights, we identify current limitations and discuss potential future directions to improve LLM inference efficiency. We release the codebase to reproduce the results presented in this paper at https://github.com/nyunAI/Faster-LLM-Survey
△ Less
Submitted 24 April, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
A Learning-based Declarative Privacy-Preserving Framework for Federated Data Management
Authors:
Hong Guan,
Summer Gautier,
Deepti Gupta,
Rajan Hari Ambrish,
Yancheng Wang,
Harsha Lakamsani,
Dhanush Giriyan,
Saajan Maslanka,
Chaowei Xiao,
Yingzhen Yang,
Jia Zou
Abstract:
It is challenging to balance the privacy and accuracy for federated query processing over multiple private data silos. In this work, we will demonstrate an end-to-end workflow for automating an emerging privacy-preserving technique that uses a deep learning model trained using the Differentially-Private Stochastic Gradient Descent (DP-SGD) algorithm to replace portions of actual data to answer a q…
▽ More
It is challenging to balance the privacy and accuracy for federated query processing over multiple private data silos. In this work, we will demonstrate an end-to-end workflow for automating an emerging privacy-preserving technique that uses a deep learning model trained using the Differentially-Private Stochastic Gradient Descent (DP-SGD) algorithm to replace portions of actual data to answer a query. Our proposed novel declarative privacy-preserving workflow allows users to specify "what private information to protect" rather than "how to protect". Under the hood, the system automatically chooses query-model transformation plans as well as hyper-parameters. At the same time, the proposed workflow also allows human experts to review and tune the selected privacy-preserving mechanism for audit/compliance, and optimization purposes.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
From Past to Future: Rethinking Eligibility Traces
Authors:
Dhawal Gupta,
Scott M. Jordan,
Shreyas Chaudhari,
Bo Liu,
Philip S. Thomas,
Bruno Castro da Silva
Abstract:
In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation. First, we delve into the nuances of eligibility traces and explore instances where their updates may result in unexpected credit assignment to preceding states. From this investigation emerges the concept of a novel value function, which we refer to as the \emph{bidirectional value functio…
▽ More
In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation. First, we delve into the nuances of eligibility traces and explore instances where their updates may result in unexpected credit assignment to preceding states. From this investigation emerges the concept of a novel value function, which we refer to as the \emph{bidirectional value function}. Unlike traditional state value functions, bidirectional value functions account for both future expected returns (rewards anticipated from the current state onward) and past expected returns (cumulative rewards from the episode's start to the present). We derive principled update equations to learn this value function and, through experimentation, demonstrate its efficacy in enhancing the process of policy evaluation. In particular, our results indicate that the proposed learning approach can, in certain challenging contexts, perform policy evaluation more rapidly than TD($λ$) -- a method that learns forward value functions, $v^π$, \emph{directly}. Overall, our findings present a new perspective on eligibility traces and potential advantages associated with the novel value function it inspires, especially for policy evaluation.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models
Authors:
Arnav Chavan,
Nahush Lele,
Deepak Gupta
Abstract:
Due to the substantial scale of Large Language Models (LLMs), the direct application of conventional compression methodologies proves impractical. The computational demands associated with even minimal gradient updates present challenges, particularly on consumer-grade hardware. This paper introduces an innovative approach for the parametric and practical compression of LLMs based on reduced order…
▽ More
Due to the substantial scale of Large Language Models (LLMs), the direct application of conventional compression methodologies proves impractical. The computational demands associated with even minimal gradient updates present challenges, particularly on consumer-grade hardware. This paper introduces an innovative approach for the parametric and practical compression of LLMs based on reduced order modelling, which entails low-rank decomposition within the feature space and re-parameterization in the weight space. Notably, this compression technique operates in a layer-wise manner, obviating the need for a GPU device and enabling the compression of billion-scale models within stringent constraints of both memory and time. Our method represents a significant advancement in model compression by leveraging matrix decomposition, demonstrating superior efficacy compared to the prevailing state-of-the-art structured pruning method.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Privacy Preserving Multi-Agent Reinforcement Learning in Supply Chains
Authors:
Ananta Mukherjee,
Peeyush Kumar,
Boling Yang,
Nishanth Chandran,
Divya Gupta
Abstract:
This paper addresses privacy concerns in multi-agent reinforcement learning (MARL), specifically within the context of supply chains where individual strategic data must remain confidential. Organizations within the supply chain are modeled as agents, each seeking to optimize their own objectives while interacting with others. As each organization's strategy is contingent on neighboring strategies…
▽ More
This paper addresses privacy concerns in multi-agent reinforcement learning (MARL), specifically within the context of supply chains where individual strategic data must remain confidential. Organizations within the supply chain are modeled as agents, each seeking to optimize their own objectives while interacting with others. As each organization's strategy is contingent on neighboring strategies, maintaining privacy of state and action-related information is crucial. To tackle this challenge, we propose a game-theoretic, privacy-preserving mechanism, utilizing a secure multi-party computation (MPC) framework in MARL settings. Our major contribution is the successful implementation of a secure MPC framework, SecFloat on EzPC, to solve this problem. However, simply implementing policy gradient methods such as MADDPG operations using SecFloat, while conceptually feasible, would be programmatically intractable. To overcome this hurdle, we devise a novel approach that breaks down the forward and backward pass of the neural network into elementary operations compatible with SecFloat , creating efficient and secure versions of the MADDPG algorithm. Furthermore, we present a learning mechanism that carries out floating point operations in a privacy-preserving manner, an important feature for successful learning in MARL framework. Experiments reveal that there is on average 68.19% less supply chain wastage in 2 PC compared to no data share, while also giving on average 42.27% better average cumulative revenue for each player. This work paves the way for practical, privacy-preserving MARL, promising significant improvements in secure computation within supply chain contexts and broadly.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
Efficient Expansion and Gradient Based Task Inference for Replay Free Incremental Learning
Authors:
Soumya Roy,
Vinay K Verma,
Deepak Gupta
Abstract:
This paper proposes a simple but highly efficient expansion-based model for continual learning. The recent feature transformation, masking and factorization-based methods are efficient, but they grow the model only over the global or shared parameter. Therefore, these approaches do not fully utilize the previously learned information because the same task-specific parameter forgets the earlier kno…
▽ More
This paper proposes a simple but highly efficient expansion-based model for continual learning. The recent feature transformation, masking and factorization-based methods are efficient, but they grow the model only over the global or shared parameter. Therefore, these approaches do not fully utilize the previously learned information because the same task-specific parameter forgets the earlier knowledge. Thus, these approaches show limited transfer learning ability. Moreover, most of these models have constant parameter growth for all tasks, irrespective of the task complexity. Our work proposes a simple filter and channel expansion based method that grows the model over the previous task parameters and not just over the global parameter. Therefore, it fully utilizes all the previously learned information without forgetting, which results in better knowledge transfer. The growth rate in our proposed model is a function of task complexity; therefore for a simple task, the model has a smaller parameter growth while for complex tasks, the model requires more parameters to adapt to the current task. Recent expansion based models show promising results for task incremental learning (TIL). However, for class incremental learning (CIL), prediction of task id is a crucial challenge; hence, their results degrade rapidly as the number of tasks increase. In this work, we propose a robust task prediction method that leverages entropy weighted data augmentations and the models gradient using pseudo labels. We evaluate our model on various datasets and architectures in the TIL, CIL and generative continual learning settings. The proposed approach shows state-of-the-art results in all these settings. Our extensive ablation studies show the efficacy of the proposed components.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Developmental Pretraining (DPT) for Image Classification Networks
Authors:
Niranjan Rajesh,
Debayan Gupta
Abstract:
In the backdrop of increasing data requirements of Deep Neural Networks for object recognition that is growing more untenable by the day, we present Developmental PreTraining (DPT) as a possible solution. DPT is designed as a curriculum-based pre-training approach designed to rival traditional pre-training techniques that are data-hungry. These training approaches also introduce unnecessary featur…
▽ More
In the backdrop of increasing data requirements of Deep Neural Networks for object recognition that is growing more untenable by the day, we present Developmental PreTraining (DPT) as a possible solution. DPT is designed as a curriculum-based pre-training approach designed to rival traditional pre-training techniques that are data-hungry. These training approaches also introduce unnecessary features that could be misleading when the network is employed in a downstream classification task where the data is sufficiently different from the pre-training data and is scarce. We design the curriculum for DPT by drawing inspiration from human infant visual development. DPT employs a phased approach where carefully-selected primitive and universal features like edges and shapes are taught to the network participating in our pre-training regime. A model that underwent the DPT regime is tested against models with randomised weights to evaluate the viability of DPT.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
Privacy-Preserving Data Sharing in Agriculture: Enforcing Policy Rules for Secure and Confidential Data Synthesis
Authors:
Anantaa Kotal,
Lavanya Elluri,
Deepti Gupta,
Varun Mandalapu,
Anupam Joshi
Abstract:
Big Data empowers the farming community with the information needed to optimize resource usage, increase productivity, and enhance the sustainability of agricultural practices. The use of Big Data in farming requires the collection and analysis of data from various sources such as sensors, satellites, and farmer surveys. While Big Data can provide the farming community with valuable insights and i…
▽ More
Big Data empowers the farming community with the information needed to optimize resource usage, increase productivity, and enhance the sustainability of agricultural practices. The use of Big Data in farming requires the collection and analysis of data from various sources such as sensors, satellites, and farmer surveys. While Big Data can provide the farming community with valuable insights and improve efficiency, there is significant concern regarding the security of this data as well as the privacy of the participants. Privacy regulations, such as the EU GDPR, the EU Code of Conduct on agricultural data sharing by contractual agreement, and the proposed EU AI law, have been created to address the issue of data privacy and provide specific guidelines on when and how data can be shared between organizations. To make confidential agricultural data widely available for Big Data analysis without violating the privacy of the data subjects, we consider privacy-preserving methods of data sharing in agriculture. Deep learning-based synthetic data generation has been proposed for privacy-preserving data sharing. However, there is a lack of compliance with documented data privacy policies in such privacy-preserving efforts. In this study, we propose a novel framework for enforcing privacy policy rules in privacy-preserving data generation algorithms. We explore several available agricultural codes of conduct, extract knowledge related to the privacy constraints in data, and use the extracted knowledge to define privacy bounds in a privacy-preserving generative model. We use our framework to generate synthetic agricultural data and present experimental results that demonstrate the utility of the synthetic dataset in downstream tasks. We also show that our framework can evade potential threats and secure data based on applicable regulatory policy rules.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Gender-Based Comparative Study of Type 2 Diabetes Risk Factors in Kolkata, India: A Machine Learning Approach
Authors:
Rahul Jain,
Anoushka Saha,
Gourav Daga,
Durba Bhattacharya,
Madhura Das Gupta,
Sourav Chowdhury,
Suparna Roychowdhury
Abstract:
Type 2 diabetes mellitus represents a prevalent and widespread global health concern, necessitating a comprehensive assessment of its risk factors. This study aimed towards learning whether there is any differential impact of age, Lifestyle, BMI and Waist to height ratio on the risk of Type 2 diabetes mellitus in males and females in Kolkata, West Bengal, India based on a sample observed from the…
▽ More
Type 2 diabetes mellitus represents a prevalent and widespread global health concern, necessitating a comprehensive assessment of its risk factors. This study aimed towards learning whether there is any differential impact of age, Lifestyle, BMI and Waist to height ratio on the risk of Type 2 diabetes mellitus in males and females in Kolkata, West Bengal, India based on a sample observed from the out-patient consultation department of Belle Vue Clinic in Kolkata. Various machine learning models like Logistic Regression, Random Forest, and Support Vector Classifier, were used to predict the risk of diabetes, and performance was compared based on different predictors. Our findings indicate a significant age-related increase in risk of diabetes for both males and females. Although exercising and BMI was found to have significant impact on the risk of Type 2 diabetes in males, in females both turned out to be statistically insignificant. For both males and females, predictive models based on WhtR demonstrated superior performance in risk assessment compared to those based on BMI. This study sheds light on the gender-specific differences in the risk factors for Type 2 diabetes, offering valuable insights that can be used towards more targeted healthcare interventions and public health strategies.
△ Less
Submitted 14 October, 2023;
originally announced November 2023.
-
Evolving Domain Adaptation of Pretrained Language Models for Text Classification
Authors:
Yun-Shiuan Chuang,
Yi Wu,
Dhruv Gupta,
Rheeya Uppaal,
Ananya Kumar,
Luhang Sun,
Makesh Narsimhan Sreedhar,
Sijia Yang,
Timothy T. Rogers,
Junjie Hu
Abstract:
Adapting pre-trained language models (PLMs) for time-series text classification amidst evolving domain shifts (EDS) is critical for maintaining accuracy in applications like stance detection. This study benchmarks the effectiveness of evolving domain adaptation (EDA) strategies, notably self-training, domain-adversarial training, and domain-adaptive pretraining, with a focus on an incremental self…
▽ More
Adapting pre-trained language models (PLMs) for time-series text classification amidst evolving domain shifts (EDS) is critical for maintaining accuracy in applications like stance detection. This study benchmarks the effectiveness of evolving domain adaptation (EDA) strategies, notably self-training, domain-adversarial training, and domain-adaptive pretraining, with a focus on an incremental self-training method. Our analysis across various datasets reveals that this incremental method excels at adapting PLMs to EDS, outperforming traditional domain adaptation techniques. These findings highlight the importance of continually updating PLMs to ensure their effectiveness in real-world applications, paving the way for future research into PLM robustness against the natural temporal evolution of language.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Deep Learning meets Blockchain for Automated and Secure Access Control
Authors:
Asma Jodeiri Akbarfam,
Sina Barazandeh,
Deepti Gupta,
Hoda Maleki
Abstract:
Access control is a critical component of computer security, governing access to system resources. However, designing policies and roles in traditional access control can be challenging and difficult to maintain in dynamic and complex systems, which is particularly problematic for organizations with numerous resources. Furthermore, traditional methods suffer from issues such as third-party involve…
▽ More
Access control is a critical component of computer security, governing access to system resources. However, designing policies and roles in traditional access control can be challenging and difficult to maintain in dynamic and complex systems, which is particularly problematic for organizations with numerous resources. Furthermore, traditional methods suffer from issues such as third-party involvement, inefficiency, and privacy gaps, making transparent and dynamic access control an ongoing research problem. Moreover detecting malicious activities and identifying users who are not behaving appropriately can present notable difficulties. To address these challenges, we propose DLACB, a Deep Learning Based Access Control Using Blockchain, as a solution to decentralized access control. DLACB uses blockchain to provide transparency, traceability, and reliability in various domains such as medicine, finance, and government while taking advantage of deep learning to not rely on predefined policies and eventually automate access control. With the integration of blockchain and deep learning for access control, DLACB can provide a general framework applicable to various domains, enabling transparent and reliable logging of all transactions. As all data is recorded on the blockchain, we have the capability to identify malicious activities. We store a list of malicious activities in the storage system and employ a verification algorithm to cross-reference it with the blockchain. We conduct measurements and comparisons of the smart contract processing time for the deployed access control system in contrast to traditional access control methods, determining the time overhead involved. The processing time of DLBAC demonstrates remarkable stability when exposed to increased request volumes.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
EELBERT: Tiny Models through Dynamic Embeddings
Authors:
Gabrielle Cohn,
Rishika Agarwal,
Deepanshu Gupta,
Siddharth Patwardhan
Abstract:
We introduce EELBERT, an approach for compression of transformer-based models (e.g., BERT), with minimal impact on the accuracy of downstream tasks. This is achieved by replacing the input embedding layer of the model with dynamic, i.e. on-the-fly, embedding computations. Since the input embedding layer accounts for a significant fraction of the model size, especially for the smaller BERT variants…
▽ More
We introduce EELBERT, an approach for compression of transformer-based models (e.g., BERT), with minimal impact on the accuracy of downstream tasks. This is achieved by replacing the input embedding layer of the model with dynamic, i.e. on-the-fly, embedding computations. Since the input embedding layer accounts for a significant fraction of the model size, especially for the smaller BERT variants, replacing this layer with an embedding computation function helps us reduce the model size significantly. Empirical evaluation on the GLUE benchmark shows that our BERT variants (EELBERT) suffer minimal regression compared to the traditional BERT models. Through this approach, we are able to develop our smallest model UNO-EELBERT, which achieves a GLUE score within 4% of fully trained BERT-tiny, while being 15x smaller (1.2 MB) in size.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Behavior Alignment via Reward Function Optimization
Authors:
Dhawal Gupta,
Yash Chandak,
Scott M. Jordan,
Philip S. Thomas,
Bruno Castro da Silva
Abstract:
Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadvertently inducing undesirable behaviors. Naively modifying the reward structure to offer denser and more frequent feedback can lead to unintended outco…
▽ More
Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadvertently inducing undesirable behaviors. Naively modifying the reward structure to offer denser and more frequent feedback can lead to unintended outcomes and promote behaviors that are not aligned with the designer's intended goal. Although potential-based reward shaping is often suggested as a remedy, we systematically investigate settings where deploying it often significantly impairs performance. To address these issues, we introduce a new framework that uses a bi-level objective to learn \emph{behavior alignment reward functions}. These functions integrate auxiliary rewards reflecting a designer's heuristics and domain knowledge with the environment's primary rewards. Our approach automatically determines the most effective way to blend these types of feedback, thereby enhancing robustness against heuristic reward misspecification. Remarkably, it can also adapt an agent's policy optimization process to mitigate suboptimalities resulting from limitations and biases inherent in the underlying RL algorithms. We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges. We investigate heuristic auxiliary rewards of varying quality -- some of which are beneficial and others detrimental to the learning process. Our results show that our framework offers a robust and principled way to integrate designer-specified heuristics. It not only addresses key shortcomings of existing approaches but also consistently leads to high-performing solutions, even when given misaligned or poorly-specified auxiliary reward functions.
△ Less
Submitted 31 October, 2023; v1 submitted 29 October, 2023;
originally announced October 2023.
-
Remote Heart Rate Monitoring in Smart Environments from Videos with Self-supervised Pre-training
Authors:
Divij Gupta,
Ali Etemad
Abstract:
Recent advances in deep learning have made it increasingly feasible to estimate heart rate remotely in smart environments by analyzing videos. However, a notable limitation of deep learning methods is their heavy reliance on extensive sets of labeled data for effective training. To address this issue, self-supervised learning has emerged as a promising avenue. Building on this, we introduce a solu…
▽ More
Recent advances in deep learning have made it increasingly feasible to estimate heart rate remotely in smart environments by analyzing videos. However, a notable limitation of deep learning methods is their heavy reliance on extensive sets of labeled data for effective training. To address this issue, self-supervised learning has emerged as a promising avenue. Building on this, we introduce a solution that utilizes self-supervised contrastive learning for the estimation of remote photoplethysmography (PPG) and heart rate monitoring, thereby reducing the dependence on labeled data and enhancing performance. We propose the use of 3 spatial and 3 temporal augmentations for training an encoder through a contrastive framework, followed by utilizing the late-intermediate embeddings of the encoder for remote PPG and heart rate estimation. Our experiments on two publicly available datasets showcase the improvement of our proposed approach over several related works as well as supervised learning baselines, as our results approach the state-of-the-art. We also perform thorough experiments to showcase the effects of using different design choices such as the video representation learning method, the augmentations used in the pre-training stage, and others. We also demonstrate the robustness of our proposed method over the supervised learning approaches on reduced amounts of labeled data.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Tight Short-Lived Signatures
Authors:
Arup Mondal,
Ruthu Hulikal Rooparaghunath,
Debayan Gupta
Abstract:
A Time-lock puzzle (TLP) sends information into the future: a predetermined number of sequential computations must occur (i.e., a predetermined amount of time must pass) to retrieve the information, regardless of parallelization. Buoyed by the excitement around secure decentralized applications and cryptocurrencies, the last decade has witnessed numerous constructions of TLP variants and related a…
▽ More
A Time-lock puzzle (TLP) sends information into the future: a predetermined number of sequential computations must occur (i.e., a predetermined amount of time must pass) to retrieve the information, regardless of parallelization. Buoyed by the excitement around secure decentralized applications and cryptocurrencies, the last decade has witnessed numerous constructions of TLP variants and related applications (e.g., cost-efficient blockchain designs, randomness beacons, e-voting, etc.).
In this poster, we first extend the notion of TLP by formally defining the "time-lock public key encryption" (TLPKE) scheme. Next, we introduce and construct a "tight short-lived signatures" scheme using our TLPKE. Furthermore, to test the validity of our proposed schemes, we do a proof-of-concept implementation and run detailed simulations.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Trenchcoat: Human-Computable Hashing Algorithms for Password Generation
Authors:
Ruthu Hulikal Rooparaghunath,
T. S. Harikrishnan,
Debayan Gupta
Abstract:
The average user has between 90-130 online accounts, and around $3 \times 10^{11}$ passwords are in use this year. Most people are terrible at remembering "random" passwords, so they reuse or create similar passwords using a combination of predictable words, numbers, and symbols. Previous password-generation or management protocols have imposed so large a cognitive load that users have abandoned t…
▽ More
The average user has between 90-130 online accounts, and around $3 \times 10^{11}$ passwords are in use this year. Most people are terrible at remembering "random" passwords, so they reuse or create similar passwords using a combination of predictable words, numbers, and symbols. Previous password-generation or management protocols have imposed so large a cognitive load that users have abandoned them in favor of insecure yet simpler methods (e.g., writing them down or reusing minor variants).
We describe a range of candidate human-computable "hash" functions suitable for use as password generators - as long as the human (with minimal education assumptions) keeps a single, easily-memorizable "master" secret - and rate them by various metrics, including effective security.
These functions hash master-secrets with user accounts to produce sub-secrets that can be used as passwords; $F_R($s$, w) \longrightarrow y$, takes a website $w$, produces a password $y$, parameterized by master secret $s$, which may or may not be a string.
We exploit the unique configuration $R$ of each user's associative and implicit memory (detailed in section 2) to ensure that sources of randomness unique to each user are present in each master-secret $F_R$. An adversary cannot compute or verify $F_R$ efficiently since $R$ is unique to each individual; in that sense, our hash function is similar to a physically unclonable function. For the algorithms we propose, the user need only complete primitive operations such as addition, spatial navigation or searching. Critically, most of our methods are also accessible to neurodiverse, or cognitively or physically differently-abled persons.
We present results from a survey (n=134 individuals) investigating real-world usage of these methods and how people currently come up with their passwords, we also survey 400 websites to collate current password advice.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
RANDGENER: Distributed Randomness Beacon from Verifiable Delay Function
Authors:
Arup Mondal,
Ruthu Hulikal Rooparaghunath,
Debayan Gupta
Abstract:
Buoyed by the excitement around secure decentralized applications, the last few decades have seen numerous constructions of distributed randomness beacons (DRB) along with use cases; however, a secure DRB (in many variations) remains an open problem. We further note that it is natural to want some kind of reward for participants who spend time and energy evaluating the randomness beacon value -- t…
▽ More
Buoyed by the excitement around secure decentralized applications, the last few decades have seen numerous constructions of distributed randomness beacons (DRB) along with use cases; however, a secure DRB (in many variations) remains an open problem. We further note that it is natural to want some kind of reward for participants who spend time and energy evaluating the randomness beacon value -- this is already common in distributed protocols.
In this work, we present RandGener, a novel $n$-party commit-reveal-recover (or collaborative) DRB protocol with a novel reward and penalty mechanism along with a set of realistic guarantees. We design our protocol using trapdoor watermarkable verifiable delay functions in the RSA group setting (without requiring a trusted dealer or distributed key generation).
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Multi-modal Medical Neurological Image Fusion using Wavelet Pooled Edge Preserving Autoencoder
Authors:
Manisha Das,
Deep Gupta,
Petia Radeva,
Ashwini M Bakde
Abstract:
Medical image fusion integrates the complementary diagnostic information of the source image modalities for improved visualization and analysis of underlying anomalies. Recently, deep learning-based models have excelled the conventional fusion methods by executing feature extraction, feature selection, and feature fusion tasks, simultaneously. However, most of the existing convolutional neural net…
▽ More
Medical image fusion integrates the complementary diagnostic information of the source image modalities for improved visualization and analysis of underlying anomalies. Recently, deep learning-based models have excelled the conventional fusion methods by executing feature extraction, feature selection, and feature fusion tasks, simultaneously. However, most of the existing convolutional neural network (CNN) architectures use conventional pooling or strided convolutional strategies to downsample the feature maps. It causes the blurring or loss of important diagnostic information and edge details available in the source images and dilutes the efficacy of the feature extraction process. Therefore, this paper presents an end-to-end unsupervised fusion model for multimodal medical images based on an edge-preserving dense autoencoder network. In the proposed model, feature extraction is improved by using wavelet decomposition-based attention pooling of feature maps. This helps in preserving the fine edge detail information present in both the source images and enhances the visual perception of fused images. Further, the proposed model is trained on a variety of medical image pairs which helps in capturing the intensity distributions of the source images and preserves the diagnostic information effectively. Substantial experiments are conducted which demonstrate that the proposed method provides improved visual and quantitative results as compared to the other state-of-the-art fusion methods.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
A New Multimodal Medical Image Fusion based on Laplacian Autoencoder with Channel Attention
Authors:
Payal Wankhede,
Manisha Das,
Deep Gupta,
Petia Radeva,
Ashwini M Bakde
Abstract:
Medical image fusion combines the complementary information of multimodal medical images to assist medical professionals in the clinical diagnosis of patients' disorders and provide guidance during preoperative and intra-operative procedures. Deep learning (DL) models have achieved end-to-end image fusion with highly robust and accurate fusion performance. However, most DL-based fusion models perf…
▽ More
Medical image fusion combines the complementary information of multimodal medical images to assist medical professionals in the clinical diagnosis of patients' disorders and provide guidance during preoperative and intra-operative procedures. Deep learning (DL) models have achieved end-to-end image fusion with highly robust and accurate fusion performance. However, most DL-based fusion models perform down-sampling on the input images to minimize the number of learnable parameters and computations. During this process, salient features of the source images become irretrievable leading to the loss of crucial diagnostic edge details and contrast of various brain tissues. In this paper, we propose a new multimodal medical image fusion model is proposed that is based on integrated Laplacian-Gaussian concatenation with attention pooling (LGCA). We prove that our model preserves effectively complementary information and important tissue structures.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
OPPO: An Ontology for Describing Fine-Grained Data Practices in Privacy Policies of Online Social Networks
Authors:
Sanonda Datta Gupta,
Torsten Hahmann
Abstract:
Privacy policies outline the data practices of Online Social Networks (OSN) to comply with privacy regulations such as the EU-GDPR and CCPA. Several ontologies for modeling privacy regulations, policies, and compliance have emerged in recent years. However, they are limited in various ways: (1) they specifically model what is required of privacy policies according to one specific privacy regulatio…
▽ More
Privacy policies outline the data practices of Online Social Networks (OSN) to comply with privacy regulations such as the EU-GDPR and CCPA. Several ontologies for modeling privacy regulations, policies, and compliance have emerged in recent years. However, they are limited in various ways: (1) they specifically model what is required of privacy policies according to one specific privacy regulation such as GDPR; (2) they provide taxonomies of concepts but are not sufficiently axiomatized to afford automated reasoning with them; and (3) they do not model data practices of privacy policies in sufficient detail to allow assessing the transparency of policies. This paper presents an OWL Ontology for Privacy Policies of OSNs, OPPO, that aims to fill these gaps by formalizing detailed data practices from OSNS' privacy policies. OPPO is grounded in BFO, IAO, OMRSE, and OBI, and its design is guided by the use case of representing and reasoning over the content of OSNs' privacy policies and evaluating policies' transparency in greater detail.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches
Authors:
Deepak Gupta,
Kush Attal,
Dina Demner-Fushman
Abstract:
The increase in the availability of online videos has transformed the way we access information and knowledge. A growing number of individuals now prefer instructional videos as they offer a series of step-by-step procedures to accomplish particular tasks. The instructional videos from the medical domain may provide the best possible visual answers to first aid, medical emergency, and medical educ…
▽ More
The increase in the availability of online videos has transformed the way we access information and knowledge. A growing number of individuals now prefer instructional videos as they offer a series of step-by-step procedures to accomplish particular tasks. The instructional videos from the medical domain may provide the best possible visual answers to first aid, medical emergency, and medical education questions. Toward this, this paper is focused on answering health-related questions asked by the public by providing visual answers from medical videos. The scarcity of large-scale datasets in the medical domain is a key challenge that hinders the development of applications that can help the public with their health-related questions. To address this issue, we first proposed a pipelined approach to create two large-scale datasets: HealthVidQA-CRF and HealthVidQA-Prompt. Later, we proposed monomodal and multimodal approaches that can effectively provide visual answers from medical videos to natural language questions. We conducted a comprehensive analysis of the results, focusing on the impact of the created datasets on model training and the significance of visual features in enhancing the performance of the monomodal and multi-modal approaches. Our findings suggest that these datasets have the potential to enhance the performance of medical visual answer localization tasks and provide a promising future direction to further enhance the performance by using pre-trained language-vision models.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
Authors:
Simeng Sun,
Dhawal Gupta,
Mohit Iyyer
Abstract:
During the last stage of RLHF, a large language model is aligned to human intents via PPO training, a process that generally requires large-scale computational resources. In this technical report, we empirically investigate an efficient implementation of RLHF using low-rank adaptation (LoRA), which allows us to align the LLaMA 7B checkpoint on the Alpaca dataset using only two A100 GPUs instead of…
▽ More
During the last stage of RLHF, a large language model is aligned to human intents via PPO training, a process that generally requires large-scale computational resources. In this technical report, we empirically investigate an efficient implementation of RLHF using low-rank adaptation (LoRA), which allows us to align the LLaMA 7B checkpoint on the Alpaca dataset using only two A100 GPUs instead of the eight required for full model fine-tuning. Despite tuning only 0.2% of LLaMA 7B's parameters, our implementation achieves better performance than the publicly-released AlpacaFarm checkpoint with full model fine-tuning. Next, we analyze several configurations of our LoRA-based PPO implementation, varying the form of the KL regularization term in the training objective. We find that (1) removing this penalty term does not harm performance on the AlpacaFarm evaluation set under our LoRA setup; (2) other regularizers, such as Jensen-Shannon divergence, lead to improved performance; and (3) while PPO training negatively impacts the factuality of model-generated responses, training with LoRA largely mitigates this effect. We release our code and pretrained checkpoints to facilitate future research on more efficient RLHF.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime Inference
Authors:
Soumya Banerjee,
Vinay K. Verma,
Avideep Mukherjee,
Deepak Gupta,
Vinay P. Namboodiri,
Piyush Rai
Abstract:
Lifelong learning or continual learning is the problem of training an AI agent continuously while also preventing it from forgetting its previously acquired knowledge. Streaming lifelong learning is a challenging setting of lifelong learning with the goal of continuous learning in a dynamic non-stationary environment without forgetting. We introduce a novel approach to lifelong learning, which is…
▽ More
Lifelong learning or continual learning is the problem of training an AI agent continuously while also preventing it from forgetting its previously acquired knowledge. Streaming lifelong learning is a challenging setting of lifelong learning with the goal of continuous learning in a dynamic non-stationary environment without forgetting. We introduce a novel approach to lifelong learning, which is streaming (observes each training example only once), requires a single pass over the data, can learn in a class-incremental manner, and can be evaluated on-the-fly (anytime inference). To accomplish these, we propose a novel \emph{virtual gradients} based approach for continual representation learning which adapts to each new example while also generalizing well on past data to prevent catastrophic forgetting. Our approach also leverages an exponential-moving-average-based semantic memory to further enhance performance. Experiments on diverse datasets with temporally correlated observations demonstrate our method's efficacy and superior performance over existing methods.
△ Less
Submitted 19 February, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation
Authors:
Devaansh Gupta,
Siddhant Kharbanda,
Jiawei Zhou,
Wanhua Li,
Hanspeter Pfister,
Donglai Wei
Abstract:
There has been a growing interest in developing multimodal machine translation (MMT) systems that enhance neural machine translation (NMT) with visual knowledge. This problem setup involves using images as auxiliary information during training, and more recently, eliminating their use during inference. Towards this end, previous works face a challenge in training powerful MMT models from scratch d…
▽ More
There has been a growing interest in developing multimodal machine translation (MMT) systems that enhance neural machine translation (NMT) with visual knowledge. This problem setup involves using images as auxiliary information during training, and more recently, eliminating their use during inference. Towards this end, previous works face a challenge in training powerful MMT models from scratch due to the scarcity of annotated multilingual vision-language data, especially for low-resource languages. Simultaneously, there has been an influx of multilingual pre-trained models for NMT and multimodal pre-trained models for vision-language tasks, primarily in English, which have shown exceptional generalisation ability. However, these are not directly applicable to MMT since they do not provide aligned multimodal multilingual features for generative tasks. To alleviate this issue, instead of designing complex modules for MMT, we propose CLIPTrans, which simply adapts the independently pre-trained multimodal M-CLIP and the multilingual mBART. In order to align their embedding spaces, mBART is conditioned on the M-CLIP features by a prefix sequence generated through a lightweight mapping network. We train this in a two-stage pipeline which warms up the model with image captioning before the actual translation task. Through experiments, we demonstrate the merits of this framework and consequently push forward the state-of-the-art across standard benchmarks by an average of +2.67 BLEU. The code can be found at www.github.com/devaansh100/CLIPTrans.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Integration of Digital Twin and Federated Learning for Securing Vehicular Internet of Things
Authors:
Deepti Gupta,
Shafika Showkat Moni,
Ali Saman Tosun
Abstract:
In the present era of advanced technology, the Internet of Things (IoT) plays a crucial role in enabling smart connected environments. This includes various domains such as smart homes, smart healthcare, smart cities, smart vehicles, and many others.With ubiquitous smart connected devices and systems, a large amount of data associated with them is at a prime risk from malicious entities (e.g., use…
▽ More
In the present era of advanced technology, the Internet of Things (IoT) plays a crucial role in enabling smart connected environments. This includes various domains such as smart homes, smart healthcare, smart cities, smart vehicles, and many others.With ubiquitous smart connected devices and systems, a large amount of data associated with them is at a prime risk from malicious entities (e.g., users, devices, applications) in these systems. Innovative technologies, including cloud computing, Machine Learning (ML), and data analytics, support the development of anomaly detection models for the Vehicular Internet of Things (V-IoT), which encompasses collaborative automatic driving and enhanced transportation systems. However, traditional centralized anomaly detection models fail to provide better services for connected vehicles due to issues such as high latency, privacy leakage, performance overhead, and model drift. Recently, Federated Learning (FL) has gained significant recognition for its ability to address data privacy concerns in the IoT domain. Digital Twin (DT), proves beneficial in addressing uncertain crises and data security issues by creating a virtual replica that simulates various factors, including traffic trajectories, city policies, and vehicle utilization. However, the effectiveness of a V-IoT DT system heavily relies on the collection of long-term and high-quality data to make appropriate decisions. This paper introduces a Hierarchical Federated Learning (HFL) based anomaly detection model for V-IoT, aiming to enhance the accuracy of the model. Our proposed model integrates both DT and HFL approaches to create a comprehensive system for detecting malicious activities using an anomaly detection model. Additionally, real-world V-IoT use case scenarios are presented to demonstrate the application of the proposed model.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Image Segmentation Keras : Implementation of Segnet, FCN, UNet, PSPNet and other models in Keras
Authors:
Divam Gupta
Abstract:
Semantic segmentation plays a vital role in computer vision tasks, enabling precise pixel-level understanding of images. In this paper, we present a comprehensive library for semantic segmentation, which contains implementations of popular segmentation models like SegNet, FCN, UNet, and PSPNet. We also evaluate and compare these models on several datasets, offering researchers and practitioners a…
▽ More
Semantic segmentation plays a vital role in computer vision tasks, enabling precise pixel-level understanding of images. In this paper, we present a comprehensive library for semantic segmentation, which contains implementations of popular segmentation models like SegNet, FCN, UNet, and PSPNet. We also evaluate and compare these models on several datasets, offering researchers and practitioners a powerful toolset for tackling diverse segmentation challenges.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
The Potential and Pitfalls of using a Large Language Model such as ChatGPT or GPT-4 as a Clinical Assistant
Authors:
Jingqing Zhang,
Kai Sun,
Akshay Jagadeesh,
Mahta Ghahfarokhi,
Deepa Gupta,
Ashok Gupta,
Vibhor Gupta,
Yike Guo
Abstract:
Recent studies have demonstrated promising performance of ChatGPT and GPT-4 on several medical domain tasks. However, none have assessed its performance using a large-scale real-world electronic health record database, nor have evaluated its utility in providing clinical diagnostic assistance for patients across a full range of disease presentation. We performed two analyses using ChatGPT and GPT-…
▽ More
Recent studies have demonstrated promising performance of ChatGPT and GPT-4 on several medical domain tasks. However, none have assessed its performance using a large-scale real-world electronic health record database, nor have evaluated its utility in providing clinical diagnostic assistance for patients across a full range of disease presentation. We performed two analyses using ChatGPT and GPT-4, one to identify patients with specific medical diagnoses using a real-world large electronic health record database and the other, in providing diagnostic assistance to healthcare workers in the prospective evaluation of hypothetical patients. Our results show that GPT-4 across disease classification tasks with chain of thought and few-shot prompting can achieve performance as high as 96% F1 scores. For patient assessment, GPT-4 can accurately diagnose three out of four times. However, there were mentions of factually incorrect statements, overlooking crucial medical findings, recommendations for unnecessary investigations and overtreatment. These issues coupled with privacy concerns, make these models currently inadequate for real world clinical use. However, limited data and time needed for prompt engineering in comparison to configuration of conventional machine learning workflows highlight their potential for scalability across healthcare applications.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
Sem-CS: Semantic CLIPStyler for Text-Based Image Style Transfer
Authors:
Chanda Grover Kamra,
Indra Deep Mastan,
Debayan Gupta
Abstract:
CLIPStyler demonstrated image style transfer with realistic textures using only a style text description (instead of requiring a reference style image). However, the ground semantics of objects in the style transfer output is lost due to style spill-over on salient and background objects (content mismatch) or over-stylization. To solve this, we propose Semantic CLIPStyler (Sem-CS), that performs s…
▽ More
CLIPStyler demonstrated image style transfer with realistic textures using only a style text description (instead of requiring a reference style image). However, the ground semantics of objects in the style transfer output is lost due to style spill-over on salient and background objects (content mismatch) or over-stylization. To solve this, we propose Semantic CLIPStyler (Sem-CS), that performs semantic style transfer. Sem-CS first segments the content image into salient and non-salient objects and then transfers artistic style based on a given style text description. The semantic style transfer is achieved using global foreground loss (for salient objects) and global background loss (for non-salient objects). Our empirical results, including DISTS, NIMA and user study scores, show that our proposed framework yields superior qualitative and quantitative performance. Our code is available at github.com/chandagrover/sem-cs.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Latent Graph Attention for Enhanced Spatial Context
Authors:
Ayush Singh,
Yash Bhambhu,
Himanshu Buckchash,
Deepak K. Gupta,
Dilip K. Prasad
Abstract:
Global contexts in images are quite valuable in image-to-image translation problems. Conventional attention-based and graph-based models capture the global context to a large extent, however, these are computationally expensive. Moreover, the existing approaches are limited to only learning the pairwise semantic relation between any two points on the image. In this paper, we present Latent Graph A…
▽ More
Global contexts in images are quite valuable in image-to-image translation problems. Conventional attention-based and graph-based models capture the global context to a large extent, however, these are computationally expensive. Moreover, the existing approaches are limited to only learning the pairwise semantic relation between any two points on the image. In this paper, we present Latent Graph Attention (LGA) a computationally inexpensive (linear to the number of nodes) and stable, modular framework for incorporating the global context in the existing architectures, especially empowering small-scale architectures to give performance closer to large size architectures, thus making the light-weight architectures more useful for edge devices with lower compute power and lower energy needs. LGA propagates information spatially using a network of locally connected graphs, thereby facilitating to construct a semantically coherent relation between any two spatially distant points that also takes into account the influence of the intermediate pixels. Moreover, the depth of the graph network can be used to adapt the extent of contextual spread to the target dataset, thereby being able to explicitly control the added computational cost. To enhance the learning mechanism of LGA, we also introduce a novel contrastive loss term that helps our LGA module to couple well with the original architecture at the expense of minimal additional computational load. We show that incorporating LGA improves the performance on three challenging applications, namely transparent object segmentation, image restoration for dehazing and optical flow estimation.
△ Less
Submitted 12 July, 2023; v1 submitted 9 July, 2023;
originally announced July 2023.
-
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering
Authors:
Abhirama Subramanyam Penamakuri,
Manish Gupta,
Mithun Das Gupta,
Anand Mishra
Abstract:
We study visual question answering in a setting where the answer has to be mined from a pool of relevant and irrelevant images given as a context. For such a setting, a model must first retrieve relevant images from the pool and answer the question from these retrieved images. We refer to this problem as retrieval-based visual question answering (or RETVQA in short). The RETVQA is distinctively di…
▽ More
We study visual question answering in a setting where the answer has to be mined from a pool of relevant and irrelevant images given as a context. For such a setting, a model must first retrieve relevant images from the pool and answer the question from these retrieved images. We refer to this problem as retrieval-based visual question answering (or RETVQA in short). The RETVQA is distinctively different and more challenging than the traditionally-studied Visual Question Answering (VQA), where a given question has to be answered with a single relevant image in context. Towards solving the RETVQA task, we propose a unified Multi Image BART (MI-BART) that takes a question and retrieved images using our relevance encoder for free-form fluent answer generation. Further, we introduce the largest dataset in this space, namely RETVQA, which has the following salient features: multi-image and retrieval requirement for VQA, metadata-independent questions over a pool of heterogeneous images, expecting a mix of classification-oriented and open-ended generative answers. Our proposed framework achieves an accuracy of 76.5% and a fluency of 79.3% on the proposed dataset, namely RETVQA and also outperforms state-of-the-art methods by 4.9% and 11.8% on the image segment of the publicly available WebQA dataset on the accuracy and fluency metrics, respectively.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Authors:
Arnav Chavan,
Zhuang Liu,
Deepak Gupta,
Eric Xing,
Zhiqiang Shen
Abstract:
We present Generalized LoRA (GLoRA), an advanced approach for universal parameter-efficient fine-tuning tasks. Enhancing Low-Rank Adaptation (LoRA), GLoRA employs a generalized prompt module to optimize pre-trained model weights and adjust intermediate activations, providing more flexibility and capability across diverse tasks and datasets. Moreover, GLoRA facilitates efficient parameter adaptatio…
▽ More
We present Generalized LoRA (GLoRA), an advanced approach for universal parameter-efficient fine-tuning tasks. Enhancing Low-Rank Adaptation (LoRA), GLoRA employs a generalized prompt module to optimize pre-trained model weights and adjust intermediate activations, providing more flexibility and capability across diverse tasks and datasets. Moreover, GLoRA facilitates efficient parameter adaptation by employing a scalable, modular, layer-wise structure search that learns individual adapter of each layer. Originating from a unified mathematical formulation, GLoRA exhibits strong transfer learning, few-shot learning and domain generalization abilities, as it adapts to new tasks through not only weights but also additional dimensions like activations. Comprehensive experiments demonstrate that GLoRA outperforms all previous methods in natural, specialized, and structured vision benchmarks, achieving superior accuracy with fewer parameters and computations. The proposed method on LLaMA-1 and LLaMA-2 also show considerable enhancements compared to the original LoRA in the language domain. Furthermore, our structural re-parameterization design ensures that GLoRA incurs no extra inference cost, rendering it a practical solution for resource-limited applications. Code and models are available at: https://github.com/Arnav0400/ViT-Slim/tree/master/GLoRA.
△ Less
Submitted 16 October, 2023; v1 submitted 13 June, 2023;
originally announced June 2023.