Search | arXiv e-print repository

arXiv:2406.19299 [pdf, other]

PNeRV: A Polynomial Neural Representation for Videos

Authors: Sonam Gupta, Snehal Singh Tomar, Grigorios G Chrysos, Sukhendu Das, A. N. Rajagopalan

Abstract: Extracting Implicit Neural Representations (INRs) on video data poses unique challenges due to the additional temporal dimension. In the context of videos, INRs have predominantly relied on a frame-only parameterization, which sacrifices the spatiotemporal continuity observed in pixel-level (spatial) representations. To mitigate this, we introduce Polynomial Neural Representation for Videos (PNeRV… ▽ More Extracting Implicit Neural Representations (INRs) on video data poses unique challenges due to the additional temporal dimension. In the context of videos, INRs have predominantly relied on a frame-only parameterization, which sacrifices the spatiotemporal continuity observed in pixel-level (spatial) representations. To mitigate this, we introduce Polynomial Neural Representation for Videos (PNeRV), a parameter-wise efficient, patch-wise INR for videos that preserves spatiotemporal continuity. PNeRV leverages the modeling capabilities of Polynomial Neural Networks to perform the modulation of a continuous spatial (patch) signal with a continuous time (frame) signal. We further propose a custom Hierarchical Patch-wise Spatial Sampling Scheme that ensures spatial continuity while retaining parameter efficiency. We also employ a carefully designed Positional Embedding methodology to further enhance PNeRV's performance. Our extensive experimentation demonstrates that PNeRV outperforms the baselines in conventional Implicit Neural Representation tasks like compression along with downstream applications that require spatiotemporal continuity in the underlying representation. PNeRV not only addresses the challenges posed by video data in the realm of INRs but also opens new avenues for advanced video processing and analysis. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 25 pages, 17 figures, published at TMLR, Feb 2024

arXiv:2406.19102 [pdf, other]

Statements: Universal Information Extraction from Tables with Large Language Models for ESG KPIs

Authors: Lokesh Mishra, Sohayl Dhibi, Yusik Kim, Cesar Berrospi Ramis, Shubham Gupta, Michele Dolfi, Peter Staar

Abstract: Environment, Social, and Governance (ESG) KPIs assess an organization's performance on issues such as climate change, greenhouse gas emissions, water consumption, waste management, human rights, diversity, and policies. ESG reports convey this valuable quantitative information through tables. Unfortunately, extracting this information is difficult due to high variability in the table structure as… ▽ More Environment, Social, and Governance (ESG) KPIs assess an organization's performance on issues such as climate change, greenhouse gas emissions, water consumption, waste management, human rights, diversity, and policies. ESG reports convey this valuable quantitative information through tables. Unfortunately, extracting this information is difficult due to high variability in the table structure as well as content. We propose Statements, a novel domain agnostic data structure for extracting quantitative facts and related information. We propose translating tables to statements as a new supervised deep-learning universal information extraction task. We introduce SemTabNet - a dataset of over 100K annotated tables. Investigating a family of T5-based Statement Extraction Models, our best model generates statements which are 82% similar to the ground-truth (compared to baseline of 21%). We demonstrate the advantages of statements by applying our model to over 2700 tables from ESG reports. The homogeneous nature of statements permits exploratory data analysis on expansive information found in large collections of ESG reports. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: Accepted at the NLP4Climate workshop in the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

arXiv:2406.17713 [pdf, other]

Multi-objective Binary Differential Approach with Parameter Tuning for Discovering Business Process Models: MoD-ProM

Authors: Sonia Deshmukh, Shikha Gupta, Naveen Kumar

Abstract: Process discovery approaches analyze the business data to automatically uncover structured information, known as a process model. The quality of a process model is measured using quality dimensions -- completeness (replay fitness), preciseness, simplicity, and generalization. Traditional process discovery algorithms usually output a single process model. A single model may not accurately capture t… ▽ More Process discovery approaches analyze the business data to automatically uncover structured information, known as a process model. The quality of a process model is measured using quality dimensions -- completeness (replay fitness), preciseness, simplicity, and generalization. Traditional process discovery algorithms usually output a single process model. A single model may not accurately capture the observed behavior and overfit the training data. We have formed the process discovery problem in a multi-objective framework that yields several candidate solutions for the end user who can pick a suitable model based on the local environmental constraints (possibly varying). We consider the Binary Differential Evolution approach in a multi-objective framework for the task of process discovery. The proposed method employs dichotomous crossover/mutation operators. The parameters are tuned using Grey relational analysis combined with the Taguchi approach. {We have compared the proposed approach with the well-known single-objective algorithms and state-of-the-art multi-objective evolutionary algorithm -- Non-dominated Sorting Genetic Algorithm (NSGA-II).} Additional comparison via computing a weighted average of the quality dimensions is also undertaken. Results show that the proposed algorithm is computationally efficient and produces diversified candidate solutions that score high on the fitness functions. It is shown that the process models generated by the proposed approach are superior to or at least as good as those generated by the state-of-the-art algorithms. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.15958 [pdf, other]

Bone Fracture Classification using Transfer Learning

Authors: Shyam Gupta, Dhanisha Sharma

Abstract: The manual examination of X-ray images for fractures is a time-consuming process that is prone to human error. In this work, we introduce a robust yet simple training loop for the classification of fractures, which significantly outperforms existing methods. Our method achieves superior performance in less than ten epochs and utilizes the latest dataset to deliver the best-performing model for thi… ▽ More The manual examination of X-ray images for fractures is a time-consuming process that is prone to human error. In this work, we introduce a robust yet simple training loop for the classification of fractures, which significantly outperforms existing methods. Our method achieves superior performance in less than ten epochs and utilizes the latest dataset to deliver the best-performing model for this task. We emphasize the importance of training deep learning models responsibly and efficiently, as well as the critical role of selecting high-quality datasets. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: code is publicly available at - https://github.com/shyamgupta196/Bone-Fracture-Classification

arXiv:2406.14706 [pdf]

SWANN: Shuffling Weights in Crossbar Arrays for Enhanced DNN Accuracy in Deeply Scaled Technologies

Authors: Jeffry Victor, Dong Eun Kim, Chunguang Wang, Kaushik Roy, Sumeet Gupta

Abstract: Deep neural network (DNN) accelerators employing crossbar arrays capable of in-memory computing (IMC) are highly promising for neural computing platforms. However, in deeply scaled technologies, interconnect resistance severely impairs IMC robustness, leading to a drop in the system accuracy. To address this problem, we propose SWANN - a technique based on shuffling weights in crossbar arrays whic… ▽ More Deep neural network (DNN) accelerators employing crossbar arrays capable of in-memory computing (IMC) are highly promising for neural computing platforms. However, in deeply scaled technologies, interconnect resistance severely impairs IMC robustness, leading to a drop in the system accuracy. To address this problem, we propose SWANN - a technique based on shuffling weights in crossbar arrays which alleviates the detrimental effect of wire resistance on IMC. For 8T-SRAM-based 128x128 crossbar arrays in 7nm technology, SWANN enhances the accuracy from 47.78% to 83.5% for ResNet-20/CIFAR-10. We also show that SWANN can be used synergistically with Partial-Word-LineActivation, further boosting the accuracy. Moreover, we evaluate the implications of SWANN for compact ferroelectric-transistorbased crossbar arrays. SWANN incurs minimal hardware overhead, with less than a 1% increase in energy consumption. Additionally, the latency and area overheads of SWANN are ~1% and ~16%, respectively when 1 ADC is utilized per crossbar array. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14398 [pdf, other]

ATAC-Net: Zoomed view works better for Anomaly Detection

Authors: Shaurya Gupta, Neil Gautam, Anurag Malyala

Abstract: The application of deep learning in visual anomaly detection has gained widespread popularity due to its potential use in quality control and manufacturing. Current standard methods are Unsupervised, where a clean dataset is utilised to detect deviations and flag anomalies during testing. However, incorporating a few samples when the type of anomalies is known beforehand can significantly enhance… ▽ More The application of deep learning in visual anomaly detection has gained widespread popularity due to its potential use in quality control and manufacturing. Current standard methods are Unsupervised, where a clean dataset is utilised to detect deviations and flag anomalies during testing. However, incorporating a few samples when the type of anomalies is known beforehand can significantly enhance performance. Thus, we propose ATAC-Net, a framework that trains to detect anomalies from a minimal set of known prior anomalies. Furthermore, we introduce attention-guided cropping, which provides a closer view of suspect regions during the training phase. Our framework is a reliable and easy-to-understand system for detecting anomalies, and we substantiate its superiority to some of the current state-of-the-art techniques in a comparable setting. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14330 [pdf, other]

Promise of Graph Sparsification and Decomposition for Noise Reduction in QAOA: Analysis for Trapped-Ion Compilations

Authors: Jai Moondra, Philip C. Lotshaw, Greg Mohler, Swati Gupta

Abstract: We develop new approximate compilation schemes that significantly reduce the expense of compiling the Quantum Approximate Optimization Algorithm (QAOA) for solving the Max-Cut problem. Our main focus is on compilation with trapped-ion simulators using Pauli-$X$ operations and all-to-all Ising Hamiltonian $H_\text{Ising}$ evolution generated by Molmer-Sorensen or optical dipole force interactions,… ▽ More We develop new approximate compilation schemes that significantly reduce the expense of compiling the Quantum Approximate Optimization Algorithm (QAOA) for solving the Max-Cut problem. Our main focus is on compilation with trapped-ion simulators using Pauli-$X$ operations and all-to-all Ising Hamiltonian $H_\text{Ising}$ evolution generated by Molmer-Sorensen or optical dipole force interactions, though some of our results also apply to standard gate-based compilations. Our results are based on principles of graph sparsification and decomposition; the former reduces the number of edges in a graph while maintaining its cut structure, while the latter breaks a weighted graph into a small number of unweighted graphs. Though these techniques have been used as heuristics in various hybrid quantum algorithms, there have been no guarantees on their performance, to the best of our knowledge. This work provides the first provable guarantees using sparsification and decomposition to improve quantum noise resilience and reduce quantum circuit complexity. For quantum hardware that uses edge-by-edge QAOA compilations, sparsification leads to a direct reduction in circuit complexity. For trapped-ion quantum simulators implementing all-to-all $H_\text{Ising}$ pulses, we show that for a $(1-ε)$ factor loss in the Max-Cut approximation ($ε>0)$, our compilations improve the (worst-case) number of $H_\text{Ising}$ pulses from $O(n^2)$ to $O(n\log(n/ε))$ and the (worst-case) number of Pauli-$X$ bit flips from $O(n^2)$ to $O\left(\frac{n\log(n/ε)}{ε^2}\right)$ for $n$-node graphs. We demonstrate significant reductions in noise are obtained in our new compilation approaches using theory and numerical calculations for trapped-ion hardware. We anticipate these approximate compilation techniques will be useful tools in a variety of future quantum computing experiments. △ Less

Submitted 20 June, 2024; originally announced June 2024.

MSC Class: 81P68

arXiv:2406.11784 [pdf, other]

MDCR: A Dataset for Multi-Document Conditional Reasoning

Authors: Peter Baile Chen, Yi Zhang, Chunwei Liu, Sejal Gupta, Yoon Kim, Michael Cafarella

Abstract: The same real-life questions posed to different individuals may lead to different answers based on their unique situations. For instance, whether a student is eligible for a scholarship depends on eligibility conditions, such as major or degree required. ConditionalQA was proposed to evaluate models' capability of reading a document and answering eligibility questions, considering unmentioned cond… ▽ More The same real-life questions posed to different individuals may lead to different answers based on their unique situations. For instance, whether a student is eligible for a scholarship depends on eligibility conditions, such as major or degree required. ConditionalQA was proposed to evaluate models' capability of reading a document and answering eligibility questions, considering unmentioned conditions. However, it is limited to questions on single documents, neglecting harder cases that may require cross-document reasoning and optimization, for example, "What is the maximum number of scholarships attainable?" Such questions over multiple documents are not only more challenging due to more context having to understand, but also because the model has to (1) explore all possible combinations of unmentioned conditions and (2) understand the relationship between conditions across documents, to reason about the optimal outcome. To evaluate models' capability of answering such questions, we propose a new dataset MDCR, which can reflect real-world challenges and serve as a new test bed for complex conditional reasoning that requires optimization. We evaluate this dataset using the most recent LLMs and demonstrate their limitations in solving this task. We believe this dataset will facilitate future research in answering optimization questions with unknown conditions. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10528 [pdf, other]

Memory Faults in Activation-sparse Quantized Deep Neural Networks: Analysis and Mitigation using Sharpness-aware Training

Authors: Akul Malhotra, Sumeet Kumar Gupta

Abstract: Improving the hardware efficiency of deep neural network (DNN) accelerators with techniques such as quantization and sparsity enhancement have shown an immense promise. However, their inference accuracy in non-ideal real-world settings (such as in the presence of hardware faults) is yet to be systematically analyzed. In this work, we investigate the impact of memory faults on activation-sparse qua… ▽ More Improving the hardware efficiency of deep neural network (DNN) accelerators with techniques such as quantization and sparsity enhancement have shown an immense promise. However, their inference accuracy in non-ideal real-world settings (such as in the presence of hardware faults) is yet to be systematically analyzed. In this work, we investigate the impact of memory faults on activation-sparse quantized DNNs (AS QDNNs). We show that a high level of activation sparsity comes at the cost of larger vulnerability to faults, with AS QDNNs exhibiting up to 11.13% lower accuracy than the standard QDNNs. We establish that the degraded accuracy correlates with a sharper minima in the loss landscape for AS QDNNs, which makes them more sensitive to perturbations in the weight values due to faults. Based on this observation, we employ sharpness-aware quantization (SAQ) training to mitigate the impact of memory faults. The AS and standard QDNNs trained with SAQ have up to 19.50% and 15.82% higher inference accuracy, respectively compared to their conventionally trained equivalents. Moreover, we show that SAQ-trained AS QDNNs show higher accuracy in faulty settings than standard QDNNs trained conventionally. Thus, sharpness-aware training can be instrumental in achieving sparsity-related latency benefits without compromising on fault tolerance. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2301.00675

arXiv:2406.10422 [pdf, other]

Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice

Authors: Shubham Gupta, Mirco Ravanelli, Pascal Germain, Cem Subakan

Abstract: In this paper, we propose Phoneme Discretized Saliency Maps (PDSM), a discretization algorithm for saliency maps that takes advantage of phoneme boundaries for explainable detection of AI-generated voice. We experimentally show with two different Text-to-Speech systems (i.e., Tacotron2 and Fastspeech2) that the proposed algorithm produces saliency maps that result in more faithful explanations com… ▽ More In this paper, we propose Phoneme Discretized Saliency Maps (PDSM), a discretization algorithm for saliency maps that takes advantage of phoneme boundaries for explainable detection of AI-generated voice. We experimentally show with two different Text-to-Speech systems (i.e., Tacotron2 and Fastspeech2) that the proposed algorithm produces saliency maps that result in more faithful explanations compared to standard posthoc explanation methods. Moreover, by associating the saliency maps to the phoneme representations, this methodology generates explanations that tend to be more understandable than standard saliency maps on magnitude spectrograms. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted to Interspeech 2024

arXiv:2406.10090 [pdf, other]

Over-parameterization and Adversarial Robustness in Neural Networks: An Overview and Empirical Analysis

Authors: Zhang Chen, Luca Demetrio, Srishti Gupta, Xiaoyi Feng, Zhaoqiang Xia, Antonio Emanuele Cinà, Maura Pintor, Luca Oneto, Ambra Demontis, Battista Biggio, Fabio Roli

Abstract: Thanks to their extensive capacity, over-parameterized neural networks exhibit superior predictive capabilities and generalization. However, having a large parameter space is considered one of the main suspects of the neural networks' vulnerability to adversarial example -- input samples crafted ad-hoc to induce a desired misclassification. Relevant literature has claimed contradictory remarks in… ▽ More Thanks to their extensive capacity, over-parameterized neural networks exhibit superior predictive capabilities and generalization. However, having a large parameter space is considered one of the main suspects of the neural networks' vulnerability to adversarial example -- input samples crafted ad-hoc to induce a desired misclassification. Relevant literature has claimed contradictory remarks in support of and against the robustness of over-parameterized networks. These contradictory findings might be due to the failure of the attack employed to evaluate the networks' robustness. Previous research has demonstrated that depending on the considered model, the algorithm employed to generate adversarial examples may not function properly, leading to overestimating the model's robustness. In this work, we empirically study the robustness of over-parameterized networks against adversarial examples. However, unlike the previous works, we also evaluate the considered attack's reliability to support the results' veracity. Our results show that over-parameterized networks are robust against adversarial attacks as opposed to their under-parameterized counterparts. △ Less

Submitted 14 June, 2024; originally announced June 2024.

MSC Class: 68T10 ACM Class: I.5

arXiv:2406.09208 [pdf, other]

Python-based DSL for generating Verilog model of Synchronous Digital Circuits

Authors: Mandar Datar, Dhruva S. Hegde, Vendra Durga Prasad, Manish Prajapati, Neralla Manikanta, Devansh Gupta, Janampalli Pavanija, Pratyush Pare, Akash, Shivam Gupta, Sachin B. Patkar

Abstract: We have designed a Python-based Domain Specific Language (DSL) for modeling synchronous digital circuits. In this DSL, hardware is modeled as a collection of transactions -- running in series, parallel, and loops. When the model is executed by a Python interpreter, synthesizable and behavioural Verilog is generated as output, which can be integrated with other RTL designs or directly used for FPGA… ▽ More We have designed a Python-based Domain Specific Language (DSL) for modeling synchronous digital circuits. In this DSL, hardware is modeled as a collection of transactions -- running in series, parallel, and loops. When the model is executed by a Python interpreter, synthesizable and behavioural Verilog is generated as output, which can be integrated with other RTL designs or directly used for FPGA and ASIC flows. In this paper, we describe - 1) the language (DSL), which allows users to express computation in series/parallel/loop constructs, with explicit cycle boundaries, 2) the internals of a simple Python implementation to produce synthesizable Verilog, and 3) several design examples and case studies for applications in post-quantum cryptography, stereo-vision, digital signal processing and optimization techniques. In the end, we list ideas to extend this framework. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 9 pages, 13 figures

arXiv:2406.06608 [pdf, other]

The Prompt Report: A Systematic Survey of Prompting Techniques

Authors: Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, HyoJung Han, Sevien Schulhoff, Pranav Sandeep Dulepet, Saurav Vidyadhara, Dayeon Ki, Sweta Agrawal, Chau Pham, Gerson Kroiz, Feileen Li, Hudson Tao, Ashay Srivastava, Hevander Da Costa, Saloni Gupta, Megan L. Rogers, Inna Goncearenco, Giuseppe Sarli, Igor Galynker , et al. (6 additional authors not shown)

Abstract: Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a p… ▽ More Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a prompt due to the area's nascency. This paper establishes a structured understanding of prompts, by assembling a taxonomy of prompting techniques and analyzing their use. We present a comprehensive vocabulary of 33 vocabulary terms, a taxonomy of 58 text-only prompting techniques, and 40 techniques for other modalities. We further present a meta-analysis of the entire literature on natural language prefix-prompting. △ Less

Submitted 16 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.00924 [pdf, ps, other]

Faster Diffusion-based Sampling with Randomized Midpoints: Sequential and Parallel

Authors: Shivam Gupta, Linda Cai, Sitan Chen

Abstract: In recent years, there has been a surge of interest in proving discretization bounds for diffusion models. These works show that for essentially any data distribution, one can approximately sample in polynomial time given a sufficiently accurate estimate of its score functions at different noise levels. In this work, we propose a new discretization scheme for diffusion models inspired by Shen and… ▽ More In recent years, there has been a surge of interest in proving discretization bounds for diffusion models. These works show that for essentially any data distribution, one can approximately sample in polynomial time given a sufficiently accurate estimate of its score functions at different noise levels. In this work, we propose a new discretization scheme for diffusion models inspired by Shen and Lee's randomized midpoint method for log-concave sampling~\cite{ShenL19}. We prove that this approach achieves the best known dimension dependence for sampling from arbitrary smooth distributions in total variation distance ($\widetilde O(d^{5/12})$ compared to $\widetilde O(\sqrt{d})$ from prior work). We also show that our algorithm can be parallelized to run in only $\widetilde O(\log^2 d)$ parallel rounds, constituting the first provable guarantees for parallel sampling with diffusion models. As a byproduct of our methods, for the well-studied problem of log-concave sampling in total variation distance, we give an algorithm and simple analysis achieving dimension dependence $\widetilde O(d^{5/12})$ compared to $\widetilde O(\sqrt{d})$ from prior work. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.18193 [pdf, other]

In-Context Symmetries: Self-Supervised Learning through Contextual World Models

Authors: Sharut Gupta, Chenyu Wang, Yifei Wang, Tommi Jaakkola, Stefanie Jegelka

Abstract: At the core of self-supervised learning for vision is the idea of learning invariant or equivariant representations with respect to a set of data transformations. This approach, however, introduces strong inductive biases, which can render the representations fragile in downstream tasks that do not conform to these symmetries. In this work, drawing insights from world models, we propose to instead… ▽ More At the core of self-supervised learning for vision is the idea of learning invariant or equivariant representations with respect to a set of data transformations. This approach, however, introduces strong inductive biases, which can render the representations fragile in downstream tasks that do not conform to these symmetries. In this work, drawing insights from world models, we propose to instead learn a general representation that can adapt to be invariant or equivariant to different transformations by paying attention to context -- a memory module that tracks task-specific states, actions, and future states. Here, the action is the transformation, while the current and future states respectively represent the input's representation before and after the transformation. Our proposed algorithm, Contextual Self-Supervised Learning (ContextSSL), learns equivariance to all transformations (as opposed to invariance). In this way, the model can learn to encode all relevant features as general representations while having the versatility to tail down to task-wise symmetries when given a few examples as the context. Empirically, we demonstrate significant performance gains over existing methods on equivariance-related tasks, supported by both qualitative and quantitative evaluations. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 32 pages, 24 tables and 11 figures

arXiv:2405.17469 [pdf, other]

A Dataset for Research on Water Sustainability

Authors: Pranjol Sen Gupta, Md Rajib Hossen, Pengfei Li, Shaolei Ren, Mohammad A. Islam

Abstract: Freshwater scarcity is a global problem that requires collective efforts across all industry sectors. Nevertheless, a lack of access to operational water footprint data bars many applications from exploring optimization opportunities hidden within the temporal and spatial variations. To break this barrier into research in water sustainability, we build a dataset for operation direct water usage in… ▽ More Freshwater scarcity is a global problem that requires collective efforts across all industry sectors. Nevertheless, a lack of access to operational water footprint data bars many applications from exploring optimization opportunities hidden within the temporal and spatial variations. To break this barrier into research in water sustainability, we build a dataset for operation direct water usage in the cooling systems and indirect water embedded in electricity generation. Our dataset consists of the hourly water efficiency of major U.S. cities and states from 2019 to 2023. We also offer cooling system models that capture the impact of weather on water efficiency. We present a preliminary analysis of our dataset and discuss three potential applications that can benefit from it. Our dataset is publicly available at Open Science Framework (OSF) △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Accepted by ACM e-Energy 2024

arXiv:2405.15372 [pdf, other]

When far is better: The Chamberlin-Courant approach to obnoxious committee selection

Authors: Sushmita Gupta, Tanmay Inamdar, Pallavi Jain, Daniel Lokshtanov, Fahad Panolan, Saket Saurabh

Abstract: Classical work on metric space based committee selection problem interprets distance as ``near is better''. In this work, motivated by real-life situations, we interpret distance as ``far is better''. Formally stated, we initiate the study of ``obnoxious'' committee scoring rules when the voters' preferences are expressed via a metric space. To this end, we propose a model where large distances im… ▽ More Classical work on metric space based committee selection problem interprets distance as ``near is better''. In this work, motivated by real-life situations, we interpret distance as ``far is better''. Formally stated, we initiate the study of ``obnoxious'' committee scoring rules when the voters' preferences are expressed via a metric space. To this end, we propose a model where large distances imply high satisfaction and study the egalitarian avatar of the well-known Chamberlin-Courant voting rule and some of its generalizations. For a given integer value $1 \le λ\le k$, the committee size k, a voter derives satisfaction from only the $λ$-th favorite committee member; the goal is to maximize the satisfaction of the least satisfied voter. For the special case of $λ= 1$, this yields the egalitarian Chamberlin-Courant rule. In this paper, we consider general metric space and the special case of a $d$-dimensional Euclidean space. We show that when $λ$ is $1$ and $k$, the problem is polynomial-time solvable in $\mathbb{R}^2$ and general metric space, respectively. However, for $λ= k-1$, it is NP-hard even in $\mathbb{R}^2$. Thus, we have ``double-dichotomy'' in $\mathbb{R}^2$ with respect to the value of λ, where the extreme cases are solvable in polynomial time but an intermediate case is NP-hard. Furthermore, this phenomenon appears to be ``tight'' for $\mathbb{R}^2$ because the problem is NP-hard for general metric space, even for $λ=1$. Consequently, we are motivated to explore the problem in the realm of (parameterized) approximation algorithms and obtain positive results. Interestingly, we note that this generalization of Chamberlin-Courant rules encodes practical constraints that are relevant to solutions for certain facility locations. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15254 [pdf, other]

Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime

Authors: Alistair Shilton, Sunil Gupta, Santu Rana, Svetha Venkatesh

Abstract: This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology, assuming only finite-energy neural activations; and a novel representor theory for neural networks in terms of a matrix-valued kernel. The first model is exact (un-approximated) and global, casting the neural network as an elements in a reproducing kernel Banac… ▽ More This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology, assuming only finite-energy neural activations; and a novel representor theory for neural networks in terms of a matrix-valued kernel. The first model is exact (un-approximated) and global, casting the neural network as an elements in a reproducing kernel Banach space (RKBS); we use this model to provide tight bounds on Rademacher complexity. The second model is exact and local, casting the change in neural network function resulting from a bounded change in weights and biases (ie. a training step) in reproducing kernel Hilbert space (RKHS) in terms of a local-intrinsic neural kernel (LiNK). This local model provides insight into model adaptation through tight bounds on Rademacher complexity of network adaptation. We also prove that the neural tangent kernel (NTK) is a first-order approximation of the LiNK kernel. Finally, and noting that the LiNK does not provide a representor theory for technical reasons, we present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK). This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models. Throughout the paper (a) feedforward ReLU networks and (b) residual networks (ResNet) are used as illustrative examples. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15227 [pdf, other]

Neural Elevation Models for Terrain Mapping and Path Planning

Authors: Adam Dai, Shubh Gupta, Grace Gao

Abstract: This work introduces Neural Elevations Models (NEMos), which adapt Neural Radiance Fields to a 2.5D continuous and differentiable terrain model. In contrast to traditional terrain representations such as digital elevation models, NEMos can be readily generated from imagery, a low-cost data source, and provide a lightweight representation of terrain through an implicit continuous and differentiable… ▽ More This work introduces Neural Elevations Models (NEMos), which adapt Neural Radiance Fields to a 2.5D continuous and differentiable terrain model. In contrast to traditional terrain representations such as digital elevation models, NEMos can be readily generated from imagery, a low-cost data source, and provide a lightweight representation of terrain through an implicit continuous and differentiable height field. We propose a novel method for jointly training a height field and radiance field within a NeRF framework, leveraging quantile regression. Additionally, we introduce a path planning algorithm that performs gradient-based optimization of a continuous cost function for minimizing distance, slope changes, and control effort, enabled by differentiability of the height field. We perform experiments on simulated and real-world terrain imagery, demonstrating NEMos ability to generate high-quality reconstructions and produce smoother paths compared to discrete path planning methods. Future work will explore the incorporation of features and semantics into the height field, creating a generalized terrain model. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.11458 [pdf, other]

CPS-LLM: Large Language Model based Safe Usage Plan Generator for Human-in-the-Loop Human-in-the-Plant Cyber-Physical System

Authors: Ayan Banerjee, Aranyak Maity, Payal Kamboj, Sandeep K. S. Gupta

Abstract: We explore the usage of large language models (LLM) in human-in-the-loop human-in-the-plant cyber-physical systems (CPS) to translate a high-level prompt into a personalized plan of actions, and subsequently convert that plan into a grounded inference of sequential decision-making automated by a real-world CPS controller to achieve a control goal. We show that it is relatively straightforward to c… ▽ More We explore the usage of large language models (LLM) in human-in-the-loop human-in-the-plant cyber-physical systems (CPS) to translate a high-level prompt into a personalized plan of actions, and subsequently convert that plan into a grounded inference of sequential decision-making automated by a real-world CPS controller to achieve a control goal. We show that it is relatively straightforward to contextualize an LLM so it can generate domain-specific plans. However, these plans may be infeasible for the physical system to execute or the plan may be unsafe for human users. To address this, we propose CPS-LLM, an LLM retrained using an instruction tuning framework, which ensures that generated plans not only align with the physical system dynamics of the CPS but are also safe for human users. The CPS-LLM consists of two innovative components: a) a liquid time constant neural network-based physical dynamics coefficient estimator that can derive coefficients of dynamical models with some unmeasured state variables; b) the model coefficients are then used to train an LLM with prompts embodied with traces from the dynamical system and the corresponding model coefficients. We show that when the CPS-LLM is integrated with a contextualized chatbot such as BARD it can generate feasible and safe plans to manage external events such as meals for automated insulin delivery systems used by Type 1 Diabetes subjects. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: Accepted for publication in AAAI 2024, Planning for Cyber Physical Systems

arXiv:2405.06049 [pdf, other]

BB-Patch: BlackBox Adversarial Patch-Attack using Zeroth-Order Optimization

Authors: Satyadwyoom Kumar, Saurabh Gupta, Arun Balaji Buduru

Abstract: Deep Learning has become popular due to its vast applications in almost all domains. However, models trained using deep learning are prone to failure for adversarial samples and carry a considerable risk in sensitive applications. Most of these adversarial attack strategies assume that the adversary has access to the training data, the model parameters, and the input during deployment, hence, focu… ▽ More Deep Learning has become popular due to its vast applications in almost all domains. However, models trained using deep learning are prone to failure for adversarial samples and carry a considerable risk in sensitive applications. Most of these adversarial attack strategies assume that the adversary has access to the training data, the model parameters, and the input during deployment, hence, focus on perturbing the pixel level information present in the input image. Adversarial Patches were introduced to the community which helped in bringing out the vulnerability of deep learning models in a much more pragmatic manner but here the attacker has a white-box access to the model parameters. Recently, there has been an attempt to develop these adversarial attacks using black-box techniques. However, certain assumptions such as availability large training data is not valid for a real-life scenarios. In a real-life scenario, the attacker can only assume the type of model architecture used from a select list of state-of-the-art architectures while having access to only a subset of input dataset. Hence, we propose an black-box adversarial attack strategy that produces adversarial patches which can be applied anywhere in the input image to perform an adversarial attack. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05736 [pdf, other]

Optimal Baseline Corrections for Off-Policy Contextual Bandits

Authors: Shashank Gupta, Olivier Jeunen, Harrie Oosterhuis, Maarten de Rijke

Abstract: The off-policy learning paradigm allows for recommender systems and general ranking applications to be framed as decision-making problems, where we aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric. With unbiasedness comes potentially high variance, and prevalent methods exist to reduce estimation variance. These methods typically make use of cont… ▽ More The off-policy learning paradigm allows for recommender systems and general ranking applications to be framed as decision-making problems, where we aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric. With unbiasedness comes potentially high variance, and prevalent methods exist to reduce estimation variance. These methods typically make use of control variates, either additive (i.e., baseline corrections or doubly robust methods) or multiplicative (i.e., self-normalisation). Our work unifies these approaches by proposing a single framework built on their equivalence in learning scenarios. The foundation of our framework is the derivation of an equivalent baseline correction for all of the existing control variates. Consequently, our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it. This optimal estimator brings significantly improved performance in both evaluation and learning, and minimizes data requirements. Empirical observations corroborate our theoretical findings. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.00970 [pdf, other]

How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responses

Authors: Jionghao Lin, Zifei Han, Danielle R. Thomas, Ashish Gurung, Shivang Gupta, Vincent Aleven, Kenneth R. Koedinger

Abstract: One-on-one tutoring is widely acknowledged as an effective instructional method, conditioned on qualified tutors. However, the high demand for qualified tutors remains a challenge, often necessitating the training of novice tutors (i.e., trainees) to ensure effective tutoring. Research suggests that providing timely explanatory feedback can facilitate the training process for trainees. However, it… ▽ More One-on-one tutoring is widely acknowledged as an effective instructional method, conditioned on qualified tutors. However, the high demand for qualified tutors remains a challenge, often necessitating the training of novice tutors (i.e., trainees) to ensure effective tutoring. Research suggests that providing timely explanatory feedback can facilitate the training process for trainees. However, it presents challenges due to the time-consuming nature of assessing trainee performance by human experts. Inspired by the recent advancements of large language models (LLMs), our study employed the GPT-4 model to build an explanatory feedback system. This system identifies trainees' responses in binary form (i.e., correct/incorrect) and automatically provides template-based feedback with responses appropriately rephrased by the GPT-4 model. We conducted our study on 410 responses from trainees across three training lessons: Giving Effective Praise, Reacting to Errors, and Determining What Students Know. Our findings indicate that: 1) using a few-shot approach, the GPT-4 model effectively identifies correct/incorrect trainees' responses from three training lessons with an average F1 score of 0.84 and an AUC score of 0.85; and 2) using the few-shot approach, the GPT-4 model adeptly rephrases incorrect trainees' responses into desired responses, achieving performance comparable to that of human experts. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: International Journal of Artificial Intelligence in Education

arXiv:2405.00577 [pdf]

Discovering robust biomarkers of neurological disorders from functional MRI using graph neural networks: A Review

Authors: Yi Hao Chan, Deepank Girish, Sukrit Gupta, Jing Xia, Chockalingam Kasi, Yinan He, Conghao Wang, Jagath C. Rajapakse

Abstract: Graph neural networks (GNN) have emerged as a popular tool for modelling functional magnetic resonance imaging (fMRI) datasets. Many recent studies have reported significant improvements in disorder classification performance via more sophisticated GNN designs and highlighted salient features that could be potential biomarkers of the disorder. In this review, we provide an overview of how GNN and… ▽ More Graph neural networks (GNN) have emerged as a popular tool for modelling functional magnetic resonance imaging (fMRI) datasets. Many recent studies have reported significant improvements in disorder classification performance via more sophisticated GNN designs and highlighted salient features that could be potential biomarkers of the disorder. In this review, we provide an overview of how GNN and model explainability techniques have been applied on fMRI datasets for disorder prediction tasks, with a particular emphasis on the robustness of biomarkers produced for neurodegenerative diseases and neuropsychiatric disorders. We found that while most studies have performant models, salient features highlighted in these studies vary greatly across studies on the same disorder and little has been done to evaluate their robustness. To address these issues, we suggest establishing new standards that are based on objective evaluation metrics to determine the robustness of these potential biomarkers. We further highlight gaps in the existing literature and put together a prediction-attribution-evaluation framework that could set the foundations for future research on improving the robustness of potential biomarkers discovered via GNNs. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2405.00554 [pdf, other]

A First Look at Selection Bias in Preference Elicitation for Recommendation

Authors: Shashank Gupta, Harrie Oosterhuis, Maarten de Rijke

Abstract: Preference elicitation explicitly asks users what kind of recommendations they would like to receive. It is a popular technique for conversational recommender systems to deal with cold-starts. Previous work has studied selection bias in implicit feedback, e.g., clicks, and in some forms of explicit feedback, i.e., ratings on items. Despite the fact that the extreme sparsity of preference elicitati… ▽ More Preference elicitation explicitly asks users what kind of recommendations they would like to receive. It is a popular technique for conversational recommender systems to deal with cold-starts. Previous work has studied selection bias in implicit feedback, e.g., clicks, and in some forms of explicit feedback, i.e., ratings on items. Despite the fact that the extreme sparsity of preference elicitation interactions make them severely more prone to selection bias than natural interactions, the effect of selection bias in preference elicitation on the resulting recommendations has not been studied yet. To address this gap, we take a first look at the effects of selection bias in preference elicitation and how they may be further investigated in the future. We find that a big hurdle is the current lack of any publicly available dataset that has preference elicitation interactions. As a solution, we propose a simulation of a topic-based preference elicitation process. The results from our simulation-based experiments indicate (i) that ignoring the effect of selection bias early in preference elicitation can lead to an exacerbation of overrepresentation in subsequent item recommendations, and (ii) that debiasing methods can alleviate this effect, which leads to significant improvements in subsequent item recommendation performance. Our aim is for the proposed simulator and initial results to provide a starting point and motivation for future research into this important but overlooked problem setting. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: Accepted at the CONSEQUENCES'23 workshop at RecSys '23

arXiv:2404.19281 [pdf, other]

Audio-Visual Traffic Light State Detection for Urban Robots

Authors: Sagar Gupta, Akansel Cosgun

Abstract: We present a multimodal traffic light state detection using vision and sound, from the viewpoint of a quadruped robot navigating in urban settings. This is a challenging problem because of the visual occlusions and noise from robot locomotion. Our method combines features from raw audio with the ratios of red and green pixels within bounding boxes, identified by established vision-based detectors.… ▽ More We present a multimodal traffic light state detection using vision and sound, from the viewpoint of a quadruped robot navigating in urban settings. This is a challenging problem because of the visual occlusions and noise from robot locomotion. Our method combines features from raw audio with the ratios of red and green pixels within bounding boxes, identified by established vision-based detectors. The fusion method aggregates features across multiple frames in a given timeframe, increasing robustness and adaptability. Results show that our approach effectively addresses the challenge of visual occlusion and surpasses the performance of single-modality solutions when the robot is in motion. This study serves as a proof of concept, highlighting the significant, yet often overlooked, potential of multi-modal perception in robotics. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024

arXiv:2404.18400 [pdf, other]

LLM-SR: Scientific Equation Discovery via Programming with Large Language Models

Authors: Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, Chandan K Reddy

Abstract: Mathematical equations have been unreasonably effective in describing complex natural phenomena across various scientific disciplines. However, discovering such insightful equations from data presents significant challenges due to the necessity of navigating extremely high-dimensional combinatorial and nonlinear hypothesis spaces. Traditional methods of equation discovery, commonly known as symbol… ▽ More Mathematical equations have been unreasonably effective in describing complex natural phenomena across various scientific disciplines. However, discovering such insightful equations from data presents significant challenges due to the necessity of navigating extremely high-dimensional combinatorial and nonlinear hypothesis spaces. Traditional methods of equation discovery, commonly known as symbolic regression, largely focus on extracting equations from data alone, often neglecting the rich domain-specific prior knowledge that scientists typically depend on. To bridge this gap, we introduce LLM-SR, a novel approach that leverages the extensive scientific knowledge and robust code generation capabilities of Large Language Models (LLMs) to discover scientific equations from data in an efficient manner. Specifically, LLM-SR treats equations as programs with mathematical operators and combines LLMs' scientific priors with evolutionary search over equation programs. The LLM iteratively proposes new equation skeleton hypotheses, drawing from its physical understanding, which are then optimized against data to estimate skeleton parameters. We demonstrate LLM-SR's effectiveness across three diverse scientific domains, where it discovers physically accurate equations that provide significantly better fits to in-domain and out-of-domain data compared to the well-established symbolic regression baselines. Incorporating scientific prior knowledge also enables LLM-SR to search the equation space more efficiently than baselines. Code is available at: https://github.com/deep-symbolic-mathematics/LLM-SR △ Less

Submitted 2 June, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.17607 [pdf, other]

Utilizing Large Language Models to Identify Reddit Users Considering Vaping Cessation for Digital Interventions

Authors: Sai Krishna Revanth Vuruma, Dezhi Wu, Saborny Sen Gupta, Lucas Aust, Valerie Lookingbill, Caleb Henry, Yang Ren, Erin Kasson, Li-Shiun Chen, Patricia Cavazos-Rehg, Dian Hu, Ming Huang

Abstract: The widespread adoption of social media platforms globally not only enhances users' connectivity and communication but also emerges as a vital channel for the dissemination of health-related information, thereby establishing social media data as an invaluable organic data resource for public health research. The surge in popularity of vaping or e-cigarette use in the United States and other countr… ▽ More The widespread adoption of social media platforms globally not only enhances users' connectivity and communication but also emerges as a vital channel for the dissemination of health-related information, thereby establishing social media data as an invaluable organic data resource for public health research. The surge in popularity of vaping or e-cigarette use in the United States and other countries has caused an outbreak of e-cigarette and vaping use-associated lung injury (EVALI), leading to hospitalizations and fatalities in 2019, highlighting the urgency to comprehend vaping behaviors and develop effective strategies for cession. In this study, we extracted a sample dataset from one vaping sub-community on Reddit to analyze users' quit vaping intentions. Leveraging large language models including both the latest GPT-4 and traditional BERT-based language models for sentence-level quit-vaping intention prediction tasks, this study compares the outcomes of these models against human annotations. Notably, when compared to human evaluators, GPT-4 model demonstrates superior consistency in adhering to annotation guidelines and processes, showcasing advanced capabilities to detect nuanced user quit-vaping intentions that human evaluators might overlook. These preliminary findings emphasize the potential of GPT-4 in enhancing the accuracy and reliability of social media data analysis, especially in identifying subtle users' intentions that may elude human detection. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16687 [pdf, other]

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC. △ Less

Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.15549 [pdf, other]

PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models

Authors: Shashi Kant Gupta, Aditya Basu, Mauro Nievas, Jerrin Thomas, Nathan Wolfrath, Adhitya Ramamurthi, Bradley Taylor, Anai N. Kothari, Regina Schwind, Therica M. Miller, Sorena Nadaf-Rahrov, Yanshan Wang, Hrituraj Singh

Abstract: Clinical trial matching is the task of identifying trials for which patients may be potentially eligible. Typically, this task is labor-intensive and requires detailed verification of patient electronic health records (EHRs) against the stringent inclusion and exclusion criteria of clinical trials. This process is manual, time-intensive, and challenging to scale up, resulting in many patients miss… ▽ More Clinical trial matching is the task of identifying trials for which patients may be potentially eligible. Typically, this task is labor-intensive and requires detailed verification of patient electronic health records (EHRs) against the stringent inclusion and exclusion criteria of clinical trials. This process is manual, time-intensive, and challenging to scale up, resulting in many patients missing out on potential therapeutic options. Recent advancements in Large Language Models (LLMs) have made automating patient-trial matching possible, as shown in multiple concurrent research studies. However, the current approaches are confined to constrained, often synthetic datasets that do not adequately mirror the complexities encountered in real-world medical data. In this study, we present the first, end-to-end large-scale empirical evaluation of clinical trial matching using real-world EHRs. Our study showcases the capability of LLMs to accurately match patients with appropriate clinical trials. We perform experiments with proprietary LLMs, including GPT-4 and GPT-3.5, as well as our custom fine-tuned model called OncoLLM and show that OncoLLM, despite its significantly smaller size, not only outperforms GPT-3.5 but also matches the performance of qualified medical doctors. All experiments were carried out on real-world EHRs that include clinical notes and available clinical trials from a single cancer center in the United States. △ Less

Submitted 26 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

Comments: 30 Pages, 8 Figures, Supplementary Work Attached

arXiv:2404.13191 [pdf, other]

Action Contextualization: Adaptive Task Planning and Action Tuning using Large Language Models

Authors: Sthithpragya Gupta, Kunpeng Yao, Loïc Niederhauser, Aude Billard

Abstract: Large Language Models (LLMs) present a promising frontier in robotic task planning by leveraging extensive human knowledge. Nevertheless, the current literature often overlooks the critical aspects of adaptability and error correction within robotic systems. This work aims to overcome this limitation by enabling robots to modify their motion strategies and select the most suitable task plans based… ▽ More Large Language Models (LLMs) present a promising frontier in robotic task planning by leveraging extensive human knowledge. Nevertheless, the current literature often overlooks the critical aspects of adaptability and error correction within robotic systems. This work aims to overcome this limitation by enabling robots to modify their motion strategies and select the most suitable task plans based on the context. We introduce a novel framework termed action contextualization, aimed at tailoring robot actions to the precise requirements of specific tasks, thereby enhancing adaptability through applying LLM-derived contextual insights. Our proposed motion metrics guarantee the feasibility and efficiency of adjusted motions, which evaluate robot performance and eliminate planning redundancies. Moreover, our framework supports online feedback between the robot and the LLM, enabling immediate modifications to the task plans and corrections of errors. Our framework has achieved an overall success rate of 81.25% through extensive validation. Finally, integrated with dynamic system (DS)-based robot controllers, the robotic arm-hand system demonstrates its proficiency in autonomously executing LLM-generated motion plans for sequential table-clearing tasks, rectifying errors without human intervention, and completing tasks, showcasing robustness against external disturbances. Our proposed framework features the potential to be integrated with modular control approaches, significantly enhancing robots' adaptability and autonomy in sequential task execution. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.11752 [pdf]

Mapping Violence: Developing an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions

Authors: Nazia Tasnim, Sujan Sen Gupta, Md. Istiak Hossain Shihab, Fatiha Islam Juee, Arunima Tahsin, Pritom Ghum, Kanij Fatema, Marshia Haque, Wasema Farzana, Prionti Nasir, Ashique KhudaBukhsh, Farig Sadeque, Asif Sushmit

Abstract: Communal violence in online forums has become extremely prevalent in South Asia, where many communities of different cultures coexist and share resources. These societies exhibit a phenomenon characterized by strong bonds within their own groups and animosity towards others, leading to conflicts that frequently escalate into violent confrontations. To address this issue, we have developed the firs… ▽ More Communal violence in online forums has become extremely prevalent in South Asia, where many communities of different cultures coexist and share resources. These societies exhibit a phenomenon characterized by strong bonds within their own groups and animosity towards others, leading to conflicts that frequently escalate into violent confrontations. To address this issue, we have developed the first comprehensive framework for the automatic detection of communal violence markers in online Bangla content accompanying the largest collection (13K raw sentences) of social media interactions that fall under the definition of four major violence class and their 16 coarse expressions. Our workflow introduces a 7-step expert annotation process incorporating insights from social scientists, linguists, and psychologists. By presenting data statistics and benchmarking performance using this dataset, we have determined that, aside from the category of Non-communal violence, Religio-communal violence is particularly pervasive in Bangla text. Moreover, we have substantiated the effectiveness of fine-tuning language models in identifying violent comments by conducting preliminary benchmarking on the state-of-the-art Bangla deep learning model. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.09067 [pdf, other]

Exploring Explainability in Video Action Recognition

Authors: Avinab Saha, Shashank Gupta, Sravan Kumar Ankireddy, Karl Chahine, Joydeep Ghosh

Abstract: Image Classification and Video Action Recognition are perhaps the two most foundational tasks in computer vision. Consequently, explaining the inner workings of trained deep neural networks is of prime importance. While numerous efforts focus on explaining the decisions of trained deep neural networks in image classification, exploration in the domain of its temporal version, video action recognit… ▽ More Image Classification and Video Action Recognition are perhaps the two most foundational tasks in computer vision. Consequently, explaining the inner workings of trained deep neural networks is of prime importance. While numerous efforts focus on explaining the decisions of trained deep neural networks in image classification, exploration in the domain of its temporal version, video action recognition, has been scant. In this work, we take a deeper look at this problem. We begin by revisiting Grad-CAM, one of the popular feature attribution methods for Image Classification, and its extension to Video Action Recognition tasks and examine the method's limitations. To address these, we introduce Video-TCAV, by building on TCAV for Image Classification tasks, which aims to quantify the importance of specific concepts in the decision-making process of Video Action Recognition models. As the scalable generation of concepts is still an open problem, we propose a machine-assisted approach to generate spatial and spatiotemporal concepts relevant to Video Action Recognition for testing Video-TCAV. We then establish the importance of temporally-varying concepts by demonstrating the superiority of dynamic spatiotemporal concepts over trivial spatial concepts. In conclusion, we introduce a framework for investigating hypotheses in action recognition and quantitatively testing them, thus advancing research in the explainability of deep neural networks used in video action recognition. △ Less

Submitted 13 April, 2024; originally announced April 2024.

Comments: 6 pages, 10 figures, Accepted to the 3rd Explainable AI for Computer Vision (XAI4CV) Workshop at CVPR 2024

arXiv:2404.07308 [pdf, other]

Spatial Transfer Learning for Estimating PM2.5 in Data-poor Regions

Authors: Shrey Gupta, Yongbee Park, Jianzhao Bi, Suyash Gupta, Andreas Züfle, Avani Wildani, Yang Liu

Abstract: Air pollution, especially particulate matter 2.5 (PM2.5), is a pressing concern for public health and is difficult to estimate in developing countries (data-poor regions) due to a lack of ground sensors. Transfer learning models can be leveraged to solve this problem, as they use alternate data sources to gain knowledge (i.e., data from data-rich regions). However, current transfer learning method… ▽ More Air pollution, especially particulate matter 2.5 (PM2.5), is a pressing concern for public health and is difficult to estimate in developing countries (data-poor regions) due to a lack of ground sensors. Transfer learning models can be leveraged to solve this problem, as they use alternate data sources to gain knowledge (i.e., data from data-rich regions). However, current transfer learning methodologies do not account for dependencies between the source and the target domains. We recognize this transfer problem as spatial transfer learning and propose a new feature named Latent Dependency Factor (LDF) that captures spatial and semantic dependencies of both domains and is subsequently added to the feature spaces of the domains. We generate LDF using a novel two-stage autoencoder model that learns from clusters of similar source and target domain data. Our experiments show that transfer learning models using LDF have a 19.34% improvement over the baselines. We additionally support our experiments with qualitative findings. △ Less

Submitted 22 June, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: Accepted for publication at ECML-PKDD 2024

arXiv:2404.06680 [pdf, other]

Onco-Retriever: Generative Classifier for Retrieval of EHR Records in Oncology

Authors: Shashi Kant Gupta, Aditya Basu, Bradley Taylor, Anai Kothari, Hrituraj Singh

Abstract: Retrieving information from EHR systems is essential for answering specific questions about patient journeys and improving the delivery of clinical care. Despite this fact, most EHR systems still rely on keyword-based searches. With the advent of generative large language models (LLMs), retrieving information can lead to better search and summarization capabilities. Such retrievers can also feed R… ▽ More Retrieving information from EHR systems is essential for answering specific questions about patient journeys and improving the delivery of clinical care. Despite this fact, most EHR systems still rely on keyword-based searches. With the advent of generative large language models (LLMs), retrieving information can lead to better search and summarization capabilities. Such retrievers can also feed Retrieval-augmented generation (RAG) pipelines to answer any query. However, the task of retrieving information from EHR real-world clinical data contained within EHR systems in order to solve several downstream use cases is challenging due to the difficulty in creating query-document support pairs. We provide a blueprint for creating such datasets in an affordable manner using large language models. Our method results in a retriever that is 30-50 F-1 points better than propriety counterparts such as Ada and Mistral for oncology data elements. We further compare our model, called Onco-Retriever, against fine-tuned PubMedBERT model as well. We conduct an extensive manual evaluation on real-world EHR data along with latency analysis of the different models and provide a path forward for healthcare organizations to build domain-specific retrievers. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 18 pages

arXiv:2403.20145 [pdf, other]

Fine-tuning Large Language Models for Automated Diagnostic Screening Summaries

Authors: Manjeet Yadav, Nilesh Kumar Sahu, Mudita Chaturvedi, Snehil Gupta, Haroon R Lone

Abstract: Improving mental health support in developing countries is a pressing need. One potential solution is the development of scalable, automated systems to conduct diagnostic screenings, which could help alleviate the burden on mental health professionals. In this work, we evaluate several state-of-the-art Large Language Models (LLMs), with and without fine-tuning, on our custom dataset for generating… ▽ More Improving mental health support in developing countries is a pressing need. One potential solution is the development of scalable, automated systems to conduct diagnostic screenings, which could help alleviate the burden on mental health professionals. In this work, we evaluate several state-of-the-art Large Language Models (LLMs), with and without fine-tuning, on our custom dataset for generating concise summaries from mental state examinations. We rigorously evaluate four different models for summary generation using established ROUGE metrics and input from human evaluators. The results highlight that our top-performing fine-tuned model outperforms existing models, achieving ROUGE-1 and ROUGE-L values of 0.810 and 0.764, respectively. Furthermore, we assessed the fine-tuned model's generalizability on a publicly available D4 dataset, and the outcomes were promising, indicating its potential applicability beyond our custom dataset. △ Less

Submitted 4 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.16428 [pdf, other]

Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

Authors: Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Liu Zheng, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung Jin Chang, Angela Yao

Abstract: We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3D understanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the… ▽ More We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3D understanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the head movement. To this end, we designed the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits. Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks. Our analysis demonstrates the effectiveness of addressing distortion specific to egocentric cameras, adopting high-capacity transformers to learn complex hand-object interactions, and fusing predictions from different views. Our study further reveals challenging scenarios intractable with state-of-the-art methods, such as fast hand motion, object reconstruction from narrow egocentric views, and close contact between two hands and objects. Our efforts will enrich the community's knowledge foundation and facilitate future hand studies on egocentric hand-object interactions. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16336 [pdf, other]

Predictive Inference in Multi-environment Scenarios

Authors: John C. Duchi, Suyash Gupta, Kuanhao Jiang, Pragya Sur

Abstract: We address the challenge of constructing valid confidence intervals and sets in problems of prediction across multiple environments. We investigate two types of coverage suitable for these problems, extending the jackknife and split-conformal methods to show how to obtain distribution-free coverage in such non-traditional, hierarchical data-generating scenarios. Our contributions also include exte… ▽ More We address the challenge of constructing valid confidence intervals and sets in problems of prediction across multiple environments. We investigate two types of coverage suitable for these problems, extending the jackknife and split-conformal methods to show how to obtain distribution-free coverage in such non-traditional, hierarchical data-generating scenarios. Our contributions also include extensions for settings with non-real-valued responses and a theory of consistency for predictive inference in these general problems. We demonstrate a novel resizing method to adapt to problem difficulty, which applies both to existing approaches for predictive inference with hierarchical data and the methods we develop; this reduces prediction set sizes using limited information from the test environment, a key to the methods' practical performance, which we evaluate through neurochemical sensing and species classification datasets. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.10948 [pdf, other]

Real-to-Sim Adaptation via High-Fidelity Simulation to Control a Wheeled-Humanoid Robot with Unknown Dynamics

Authors: Donghoon Baek, Youngwoo Sim, Amartya Purushottam, Saurabh Gupta, Joao Ramos

Abstract: Model-based controllers using a linearized model around the system's equilibrium point is a common approach in the control of a wheeled humanoid due to their less computational load and ease of stability analysis. However, controlling a wheeled humanoid robot while it lifts an unknown object presents significant challenges, primarily due to the lack of knowledge in object dynamics. This paper pres… ▽ More Model-based controllers using a linearized model around the system's equilibrium point is a common approach in the control of a wheeled humanoid due to their less computational load and ease of stability analysis. However, controlling a wheeled humanoid robot while it lifts an unknown object presents significant challenges, primarily due to the lack of knowledge in object dynamics. This paper presents a framework designed for predicting the new equilibrium point explicitly to control a wheeled-legged robot with unknown dynamics. We estimated the total mass and center of mass of the system from its response to initially unknown dynamics, then calculated the new equilibrium point accordingly. To avoid using additional sensors (e.g., force torque sensor) and reduce the effort of obtaining expensive real data, a data-driven approach is utilized with a novel real-to-sim adaptation. A more accurate nonlinear dynamics model, offering a closer representation of real-world physics, is injected into a rigid-body simulation for real-to-sim adaptation. The nonlinear dynamics model parameters were optimized using Particle Swarm Optimization. The efficacy of this framework was validated on a physical wheeled inverted pendulum, a simplified model of a wheeled-legged robot. The experimental results indicate that employing a more precise analytical model with optimized parameters significantly reduces the gap between simulation and reality, thus improving the efficiency of a model-based controller in controlling a wheeled robot with unknown dynamics. △ Less

Submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.10270 [pdf, other]

Discrete functional inequalities on lattice graphs

Authors: Shubham Gupta

Abstract: In this thesis, we study problems at the interface of analysis and discrete mathematics. We discuss analogues of well known Hardy-type inequalities and Rearrangement inequalities on the lattice graphs $\mathbb{Z}^d$, with a particular focus on behaviour of sharp constants and optimizers.In the first half of the thesis, we analyse Hardy inequalities on $\mathbb{Z}^d$, first for $d=1$ and then for… ▽ More In this thesis, we study problems at the interface of analysis and discrete mathematics. We discuss analogues of well known Hardy-type inequalities and Rearrangement inequalities on the lattice graphs $\mathbb{Z}^d$, with a particular focus on behaviour of sharp constants and optimizers.In the first half of the thesis, we analyse Hardy inequalities on $\mathbb{Z}^d$, first for $d=1$ and then for $d \geq 3$. We prove a sharp weighted Hardy inequality on integers with power weights of the form $n^α$. This is done via two different methods, namely super-solution and Fourier method. We also use Fourier method to prove a weighted Hardy type inequality for higher order operators. After discussing the one dimensional case, we study the Hardy inequality in higher dimensions ($d \geq 3$). In particular, we compute the asymptotic behaviour of the sharp constant in the discrete Hardy inequality, as $d \rightarrow \infty$. This is done by converting the inequality into a continuous Hardy-type inequality on a torus for functions having zero average. These continuous inequalities are new and interesting in themselves. In the second half, we focus our attention on analogues of Rearrangement inequalities on lattice graphs. We begin by analysing the situation in dimension one. We define various notions of rearrangements and prove the corresponding Polya-Szegő inequality. These inequalities are also applied to prove some weighted Hardy inequalities on integers. Finally, we study Rearrangement inequalities (Polya-Szegő) on general graphs, with a particular focus on lattice graphs $\mathbb{Z}^d$, for $d \geq 2$. We develop a framework to study these inequalities, using which we derive concrete results in dimension two. In particular, these results develop connections between Polya-Szegő inequality and various isoperimetric inequalities on graphs. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: PhD thesis, Imperial College London, 140 pages

arXiv:2403.08297 [pdf]

doi 10.1021/acsphotonics.3c00473

Semi-Transparent Image Sensors for Eye-Tracking Applications

Authors: Gabriel Mercier, Emre O. Polat, Shengtai Shi, Shuchi Gupta, Gerasimos Konstantatos, Stijn Goossens, Frank H. L. Koppens

Abstract: Image sensors hold a pivotal role in society due to their ability to capture vast amounts of information. Traditionally, image sensors are opaque due to light absorption in both the pixels and the read-out electronics that are stacked on top of each other. Making image sensors visibly transparent would have a far-reaching impact in numerous areas such as human-computer interfaces, smart displays,… ▽ More Image sensors hold a pivotal role in society due to their ability to capture vast amounts of information. Traditionally, image sensors are opaque due to light absorption in both the pixels and the read-out electronics that are stacked on top of each other. Making image sensors visibly transparent would have a far-reaching impact in numerous areas such as human-computer interfaces, smart displays, and both augmented and virtual reality. In this paper, we present the development and analysis of the first semi-transparent image sensor and its applicability as an eye-tracking device. The device consists of an 8x8 array of semi-transparent photodetectors and electrodes disposed on a fully transparent substrate. Each pixel of the array has a size of 60 x 140 μm and an optical transparency of 85-95%. Pixels have a high sensitivity, with more than 90% of them showing a noise equivalent irradiance < 10-4 W/m2 for wavelengths of 637 nm. As the semi-transparent photodetectors have a large amount of built-in gain, the opaque read-out electronics can be placed far away from the detector array to ensure maximum transparency and fill factor. Indeed, the operation and appearance of transparent image sensors present a fundamental shift in how we think about cameras and imaging, as these devices can be concealed in plain sight. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08176 [pdf]

Sentiment-aware Enhancements of PageRank-based Citation Metric, Impact Factor, and H-index for Ranking the Authors of Scholarly Articles

Authors: Shikha Gupta, Animesh Kumar

Abstract: Heretofore, the only way to evaluate an author has been frequency-based citation metrics that assume citations to be of a neutral sentiment. However, considering the sentiment behind citations aids in a better understanding of the viewpoints of fellow researchers for the scholarly output of an author. Heretofore, the only way to evaluate an author has been frequency-based citation metrics that assume citations to be of a neutral sentiment. However, considering the sentiment behind citations aids in a better understanding of the viewpoints of fellow researchers for the scholarly output of an author. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: The paper has been accepted for publication in Computer Science journal: http://journals.agh.edu.pl/csci}

arXiv:2403.05931 [pdf]

Thread Detection and Response Generation using Transformers with Prompt Optimisation

Authors: Kevin Joshua T, Arnav Agarwal, Shriya Sanjay, Yash Sarda, John Sahaya Rani Alex, Saurav Gupta, Sushant Kumar, Vishwanath Kamath

Abstract: Conversational systems are crucial for human-computer interaction, managing complex dialogues by identifying threads and prioritising responses. This is especially vital in multi-party conversations, where precise identification of threads and strategic response prioritisation ensure efficient dialogue management. To address these challenges an end-to-end model that identifies threads and prioriti… ▽ More Conversational systems are crucial for human-computer interaction, managing complex dialogues by identifying threads and prioritising responses. This is especially vital in multi-party conversations, where precise identification of threads and strategic response prioritisation ensure efficient dialogue management. To address these challenges an end-to-end model that identifies threads and prioritises their response generation based on the importance was developed, involving a systematic decomposition of the problem into discrete components - thread detection, prioritisation, and performance optimisation which was meticulously analysed and optimised. These refined components seamlessly integrate into a unified framework, in conversational systems. Llama2 7b is used due to its high level of generalisation but the system can be updated with any open source Large Language Model(LLM). The computational capabilities of the Llama2 model was augmented by using fine tuning methods and strategic prompting techniques to optimise the model's performance, reducing computational time and increasing the accuracy of the model. The model achieves up to 10x speed improvement, while generating more coherent results compared to existing models. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 6 pages, 4 figures, submitted to 2024 IEEE International Conference on Signal Processing and Communications (SPCOM)

ACM Class: I.2.7; I.2.6

arXiv:2403.04265 [pdf, other]

Conflict and Fairness in Resource Allocation

Authors: Susobhan Bandopadhyay, Aritra Banik, Sushmita Gupta, Pallavi Jain, Abhishek Sahu, Saket Saurabh, Prafullkumar Tale

Abstract: In the standard model of fair allocation of resources to agents, every agent has some utility for every resource, and the goal is to assign resources to agents so that the agents' welfare is maximized. Motivated by job scheduling, interest in this problem dates back to the work of Deuermeyer et al. [SIAM J. on Algebraic Discrete Methods'82]. Recent works consider the compatibility between resource… ▽ More In the standard model of fair allocation of resources to agents, every agent has some utility for every resource, and the goal is to assign resources to agents so that the agents' welfare is maximized. Motivated by job scheduling, interest in this problem dates back to the work of Deuermeyer et al. [SIAM J. on Algebraic Discrete Methods'82]. Recent works consider the compatibility between resources and assign only mutually compatible resources to an agent. We study a fair allocation problem in which we are given a set of agents, a set of resources, a utility function for every agent over a set of resources, and a {\it conflict graph} on the set of resources (where an edge denotes incompatibility). The goal is to assign resources to the agents such that $(i)$ the set of resources allocated to an agent are compatible with each other, and $(ii)$ the minimum satisfaction of an agent is maximized, where the satisfaction of an agent is the sum of the utility of the assigned resources. Chiarelli et al. [Algorithmica'22] explore this problem from the classical complexity perspective to draw the boundary between the cases that are polynomial-time solvable and those that are \NP-hard. In this article, we study the parameterized complexity of the problem (and its variants) by considering several natural and structural parameters. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2309.04995

arXiv:2403.00584 [pdf, other]

Generalized User Representations for Transfer Learning

Authors: Ghazal Fazelnia, Sanket Gupta, Claire Keum, Mark Koh, Ian Anderson, Mounia Lalmas

Abstract: We present a novel framework for user representation in large-scale recommender systems, aiming at effectively representing diverse user taste in a generalized manner. Our approach employs a two-stage methodology combining representation learning and transfer learning. The representation learning model uses an autoencoder that compresses various user features into a representation space. In the se… ▽ More We present a novel framework for user representation in large-scale recommender systems, aiming at effectively representing diverse user taste in a generalized manner. Our approach employs a two-stage methodology combining representation learning and transfer learning. The representation learning model uses an autoencoder that compresses various user features into a representation space. In the second stage, downstream task-specific models leverage user representations via transfer learning instead of curating user features individually. We further augment this methodology on the representation's input features to increase flexibility and enable reaction to user events, including new user experiences, in Near-Real Time. Additionally, we propose a novel solution to manage deployment of this framework in production models, allowing downstream models to work independently. We validate the performance of our framework through rigorous offline and online experiments within a large-scale system, showcasing its remarkable efficacy across multiple evaluation tasks. Finally, we show how the proposed framework can significantly reduce infrastructure costs compared to alternative approaches. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.19462 [pdf, other]

Accelerating materials discovery for polymer solar cells: Data-driven insights enabled by natural language processing

Authors: Pranav Shetty, Aishat Adeboye, Sonakshi Gupta, Chao Zhang, Rampi Ramprasad

Abstract: We present a simulation of various active learning strategies for the discovery of polymer solar cell donor/acceptor pairs using data extracted from the literature spanning $\sim$20 years by a natural language processing pipeline. While data-driven methods have been well established to discover novel materials faster than Edisonian trial-and-error approaches, their benefits have not been quantifie… ▽ More We present a simulation of various active learning strategies for the discovery of polymer solar cell donor/acceptor pairs using data extracted from the literature spanning $\sim$20 years by a natural language processing pipeline. While data-driven methods have been well established to discover novel materials faster than Edisonian trial-and-error approaches, their benefits have not been quantified for material discovery problems that can take decades. Our approach demonstrates a potential reduction in discovery time by approximately 75 %, equivalent to a 15 year acceleration in material innovation. Our pipeline enables us to extract data from greater than 3300 papers which is $\sim$5 times larger and therefore more diverse than similar data sets reported by others. We also trained machine learning models to predict the power conversion efficiency and used our model to identify promising donor-acceptor combinations that are as yet unreported. We thus demonstrate a pipeline that goes from published literature to extracted material property data which in turn is used to obtain data-driven insights. Our insights include active learning strategies that can be used to train strong predictive models of material properties or be robust to the initial material system used. This work provides a valuable framework for data-driven research in materials science. △ Less

Submitted 21 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.17768 [pdf, other]

Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning

Authors: Xiaoyu Zhang, Matthew Chang, Pranav Kumar, Saurabh Gupta

Abstract: A common failure mode for policies trained with imitation is compounding execution errors at test time. When the learned policy encounters states that are not present in the expert demonstrations, the policy fails, leading to degenerate behavior. The Dataset Aggregation, or DAgger approach to this problem simply collects more data to cover these failure states. However, in practice, this is often… ▽ More A common failure mode for policies trained with imitation is compounding execution errors at test time. When the learned policy encounters states that are not present in the expert demonstrations, the policy fails, leading to degenerate behavior. The Dataset Aggregation, or DAgger approach to this problem simply collects more data to cover these failure states. However, in practice, this is often prohibitively expensive. In this work, we propose Diffusion Meets DAgger (DMD), a method to reap the benefits of DAgger without the cost for eye-in-hand imitation learning problems. Instead of collecting new samples to cover out-of-distribution states, DMD uses recent advances in diffusion models to synthesize these samples. This leads to robust performance from few demonstrations. We compare DMD against behavior cloning baseline across four tasks: pushing, stacking, pouring, and shirt hanging. In pushing, DMD achieves 80% success rate with as few as 8 expert demonstrations, where naive behavior cloning reaches only 20%. In stacking, DMD succeeds on average 92% of the time across 5 cups, versus 40% for BC. When pouring coffee beans, DMD transfers to another cup successfully 80% of the time. Finally, DMD attains 90% success rate for hanging shirt on a clothing rack. △ Less

Submitted 5 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted by Robotics: Science and Systems (RSS) 2024. project website with video, see https://sites.google.com/view/diffusion-meets-dagger

arXiv:2402.17767 [pdf, other]

Opening Cabinets and Drawers in the Real World using a Commodity Mobile Manipulator

Authors: Arjun Gupta, Michelle Zhang, Rishik Sathua, Saurabh Gupta

Abstract: Pulling open cabinets and drawers presents many difficult technical challenges in perception (inferring articulation parameters for objects from onboard sensors), planning (producing motion plans that conform to tight task constraints), and control (making and maintaining contact while applying forces on the environment). In this work, we build an end-to-end system that enables a commodity mobile… ▽ More Pulling open cabinets and drawers presents many difficult technical challenges in perception (inferring articulation parameters for objects from onboard sensors), planning (producing motion plans that conform to tight task constraints), and control (making and maintaining contact while applying forces on the environment). In this work, we build an end-to-end system that enables a commodity mobile manipulator (Stretch RE2) to pull open cabinets and drawers in diverse previously unseen real world environments. We conduct 4 days of real world testing of this system spanning 31 different objects from across 13 different real world environments. Our system achieves a success rate of 61% on opening novel cabinets and drawers in unseen environments zero-shot. An analysis of the failure modes suggests that errors in perception are the most significant challenge for our system. We will open source code and models for others to replicate and build upon our system. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Project webpage: https://arjung128.github.io/opening-cabinets-and-drawers

arXiv:2402.17343 [pdf, other]

Enhanced Bayesian Optimization via Preferential Modeling of Abstract Properties

Authors: Arun Kumar A V, Alistair Shilton, Sunil Gupta, Santu Rana, Stewart Greenhill, Svetha Venkatesh

Abstract: Experimental (design) optimization is a key driver in designing and discovering new products and processes. Bayesian Optimization (BO) is an effective tool for optimizing expensive and black-box experimental design processes. While Bayesian optimization is a principled data-driven approach to experimental optimization, it learns everything from scratch and could greatly benefit from the expertise… ▽ More Experimental (design) optimization is a key driver in designing and discovering new products and processes. Bayesian Optimization (BO) is an effective tool for optimizing expensive and black-box experimental design processes. While Bayesian optimization is a principled data-driven approach to experimental optimization, it learns everything from scratch and could greatly benefit from the expertise of its human (domain) experts who often reason about systems at different abstraction levels using physical properties that are not necessarily directly measured (or measurable). In this paper, we propose a human-AI collaborative Bayesian framework to incorporate expert preferences about unmeasured abstract properties into the surrogate modeling to further boost the performance of BO. We provide an efficient strategy that can also handle any incorrect/misleading expert bias in preferential judgments. We discuss the convergence behavior of our proposed framework. Our experimental results involving synthetic functions and real-world datasets show the superiority of our method against the baselines. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 19 Pages, 6 Figures

arXiv:2402.16237 [pdf, other]

Active Level Set Estimation for Continuous Search Space with Theoretical Guarantee

Authors: Giang Ngo, Dang Nguyen, Dat Phan-Trong, Sunil Gupta

Abstract: A common problem encountered in many real-world applications is level set estimation where the goal is to determine the region in the function domain where the function is above or below a given threshold. When the function is black-box and expensive to evaluate, the level sets need to be found in a minimum set of function evaluations. Existing methods often assume a discrete search space with a f… ▽ More A common problem encountered in many real-world applications is level set estimation where the goal is to determine the region in the function domain where the function is above or below a given threshold. When the function is black-box and expensive to evaluate, the level sets need to be found in a minimum set of function evaluations. Existing methods often assume a discrete search space with a finite set of data points for function evaluations and estimating the level sets. When applied to a continuous search space, these methods often need to first discretize the space which leads to poor results while needing high computational time. While some methods cater for the continuous setting, they still lack a proper guarantee for theoretical convergence. To address this problem, we propose a novel algorithm that does not need any discretization and can directly work in continuous search spaces. Our method suggests points by constructing an acquisition function that is defined as a measure of confidence of the function being higher or lower than the given threshold. A theoretical analysis for the convergence of the algorithm to an accurate solution is provided. On multiple synthetic and real-world datasets, our algorithm successfully outperforms state-of-the-art methods. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Showing 1–50 of 688 results for author: Gupta, S