Search | arXiv e-print repository

COSMo: CLIP Talks on Open-Set Multi-Target Domain Adaptation

Authors: Munish Monga, Sachin Kumar Giroh, Ankit Jha, Mainak Singha, Biplab Banerjee, Jocelyn Chanussot

Abstract: Multi-Target Domain Adaptation (MTDA) entails learning domain-invariant information from a single source domain and applying it to multiple unlabeled target domains. Yet, existing MTDA methods predominantly focus on addressing domain shifts within visual features, often overlooking semantic features and struggling to handle unknown classes, resulting in what is known as Open-Set (OS) MTDA. While l… ▽ More Multi-Target Domain Adaptation (MTDA) entails learning domain-invariant information from a single source domain and applying it to multiple unlabeled target domains. Yet, existing MTDA methods predominantly focus on addressing domain shifts within visual features, often overlooking semantic features and struggling to handle unknown classes, resulting in what is known as Open-Set (OS) MTDA. While large-scale vision-language foundation models like CLIP show promise, their potential for MTDA remains largely unexplored. This paper introduces COSMo, a novel method that learns domain-agnostic prompts through source domain-guided prompt learning to tackle the MTDA problem in the prompt space. By leveraging a domain-specific bias network and separate prompts for known and unknown classes, COSMo effectively adapts across domain and class shifts. To the best of our knowledge, COSMo is the first method to address Open-Set Multi-Target DA (OSMTDA), offering a more realistic representation of real-world scenarios and addressing the challenges of both open-set and multi-target DA. COSMo demonstrates an average improvement of $5.1\%$ across three challenging datasets: Mini-DomainNet, Office-31, and Office-Home, compared to other related DA methods adapted to operate within the OSMTDA setting. Code is available at: https://github.com/munish30monga/COSMo △ Less

Submitted 31 August, 2024; originally announced September 2024.

Comments: Accepted in BMVC 2024

arXiv:2409.00354 [pdf, other]

A parameter uniform hybrid approach for singularly perturbed two-parameter parabolic problem with discontinuous data

Authors: Nirmali Roy, Anuradha Jha

Abstract: In this article, we address singularly perturbed two-parameter parabolic problem of the reaction-convection-diffusion type in two dimensions. These problems exhibit discontinuities in the source term and convection coefficient at particular domain points, which result in the formation of interior layers. The presence of two perturbation parameters leads to the formation of boundary layers with var… ▽ More In this article, we address singularly perturbed two-parameter parabolic problem of the reaction-convection-diffusion type in two dimensions. These problems exhibit discontinuities in the source term and convection coefficient at particular domain points, which result in the formation of interior layers. The presence of two perturbation parameters leads to the formation of boundary layers with varying widths. Our primary focus is to address these layers and develop a scheme that is uniformly convergent. So we propose a hybrid monotone difference scheme for the spatial direction, implemented on a specially designed piece-wise uniform Shishkin mesh, combined with the Crank-Nicolson method on a uniform mesh for the temporal direction. The resulting scheme is proven to be uniformly convergent, with an order of almost two in the spatial direction and exactly two in the temporal direction. Numerical experiments support the theoretically proven higher order of convergence and shows that our approach results in better accuracy and convergence compared to other existing methods in the literature. △ Less

Submitted 31 August, 2024; originally announced September 2024.

arXiv:2408.05820 [pdf, other]

The moments of split greatest common divisors

Authors: Abhishek Jha, Ayan Nath, Emanuele Tron

Abstract: Sequences of the form $(\gcd(u_n,v_n))_{n \in \mathbb N}$, with $(u_n)_n$, $(v_n)_n$ sums of $S$-units, have been considered by several authors. The study of $\gcd(n,u_n)$ corresponds, following Silverman, to divisibility sequences arising from the split algebraic group $\mathbb G_{\mathrm{a}} \times \mathbb G_{\mathrm{m}}$; in this case, Sanna determined all asymptotic moments of the arithmetic f… ▽ More Sequences of the form $(\gcd(u_n,v_n))_{n \in \mathbb N}$, with $(u_n)_n$, $(v_n)_n$ sums of $S$-units, have been considered by several authors. The study of $\gcd(n,u_n)$ corresponds, following Silverman, to divisibility sequences arising from the split algebraic group $\mathbb G_{\mathrm{a}} \times \mathbb G_{\mathrm{m}}$; in this case, Sanna determined all asymptotic moments of the arithmetic function $\log\,\gcd (n,u_n)$ when $(u_n)_n$ is a Lucas sequence. Here, we characterize the asymptotic behavior of the moments themselves $\sum_{n \leq x}\,\gcd(n,u_n)^λ$, thus solving the moment problem for $\mathbb G_{\mathrm{a}} \times \mathbb G_{\mathrm{m}}$. We give both unconditional and conditional results, the latter only relying on standard conjectures in analytic number theory. △ Less

Submitted 15 August, 2024; v1 submitted 11 August, 2024; originally announced August 2024.

Comments: 15 pages, 1 figure

MSC Class: 11N56; 11B37

arXiv:2407.17766 [pdf, other]

Strategic Pseudo-Goal Perturbation for Deadlock-Free Multi-Agent Navigation in Social Mini-Games

Authors: Abhishek Jha, Tanishq Gupta, Sumit Singh Rawat, Girish Kumar

Abstract: This work introduces a Strategic Pseudo-Goal Perturbation (SPGP) technique, a novel approach to resolve deadlock situations in multi-agent navigation scenarios. Leveraging the robust framework of Safety Barrier Certificates, our method integrates a strategic perturbation mechanism that guides agents through social mini-games where deadlock and collision occur frequently. The method adopts a strate… ▽ More This work introduces a Strategic Pseudo-Goal Perturbation (SPGP) technique, a novel approach to resolve deadlock situations in multi-agent navigation scenarios. Leveraging the robust framework of Safety Barrier Certificates, our method integrates a strategic perturbation mechanism that guides agents through social mini-games where deadlock and collision occur frequently. The method adopts a strategic calculation process where agents, upon encountering a deadlock select a pseudo goal within a predefined radius around the current position to resolve the deadlock among agents. The calculation is based on controlled strategic algorithm, ensuring that deviation towards pseudo-goal is both purposeful and effective in resolution of deadlock. Once the agent reaches the pseudo goal, it resumes the path towards the original goal, thereby enhancing navigational efficiency and safety. Experimental results demonstrates SPGP's efficacy in reducing deadlock instances and improving overall system throughput in variety of multi-agent navigation scenarios. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.13265 [pdf, other]

Capillary lubrication of a spherical particle near a fluid interface

Authors: Aditya Jha, Yacine Amarouchene, Thomas Salez

Abstract: The lubricated motion of an object near a deformable boundary presents striking subtleties arising from the coupling between the elasticity of the boundary and lubricated flow, including but not limited to the emergence of a lift force acting on the object despite the zero Reynolds number. In this study, we characterize the hydrodynamic forces and torques felt by a sphere translating in close prox… ▽ More The lubricated motion of an object near a deformable boundary presents striking subtleties arising from the coupling between the elasticity of the boundary and lubricated flow, including but not limited to the emergence of a lift force acting on the object despite the zero Reynolds number. In this study, we characterize the hydrodynamic forces and torques felt by a sphere translating in close proximity to a fluid interface, separating the viscous medium of the sphere's motion from an infinitely-more-viscous medium. We employ lubrication theory and perform a perturbation analysis in capillary compliance. The dominant response of the interface owing to surface tension results in a long-ranged interface deformation, which leads to a modification of the forces and torques with respect to the rigid reference case, that we characterise in details with scaling arguments and numerical integrations. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.07858 [pdf, other]

FACTS About Building Retrieval Augmented Generation-based Chatbots

Authors: Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan , et al. (13 additional authors not shown)

Abstract: Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This… ▽ More Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This includes fine-tuning embeddings and LLMs, extracting documents from vector databases, rephrasing queries, reranking results, designing prompts, honoring document access controls, providing concise responses, including references, safeguarding personal information, and building orchestration agents. We present a framework for building RAG-based chatbots based on our experience with three NVIDIA chatbots: for IT/HR benefits, financial earnings, and general content. Our contributions are three-fold: introducing the FACTS framework (Freshness, Architectures, Cost, Testing, Security), presenting fifteen RAG pipeline control points, and providing empirical results on accuracy-latency tradeoffs between large and small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots." △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 8 pages, 6 figures, 2 tables, Preprint submission to ACM CIKM 2024

arXiv:2407.04319 [pdf, other]

Singular viscoelastic perturbation to soft lubrication

Authors: Bharti Bharti, Quentin Ferreira, Aditya Jha, Andreas Carlson, David S. Dean, Yacine Amarouchene, Tak Shing Chan, Thomas Salez

Abstract: Soft lubrication has been shown to drastically affect the mobility of an object immersed in a viscous fluid in the vicinity of a purely elastic wall. In this theoretical study, we develop a minimal model incorporating viscoelasticity, carrying out a perturbation analysis in both the elastic deformation of the wall and its viscous damping. Our approach reveals the singular-perturbation nature of… ▽ More Soft lubrication has been shown to drastically affect the mobility of an object immersed in a viscous fluid in the vicinity of a purely elastic wall. In this theoretical study, we develop a minimal model incorporating viscoelasticity, carrying out a perturbation analysis in both the elastic deformation of the wall and its viscous damping. Our approach reveals the singular-perturbation nature of viscoelasticity to soft lubrication. Numerical resolution of the resulting non-linear, singular and coupled equations of motion reveals peculiar effects of viscoelasticity on confined colloidal mobility, opening the way towards the description of complex migration scenarios near realistic polymeric substrates and biological membranes. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04207 [pdf, other]

Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning

Authors: Mainak Singha, Ankit Jha, Divyam Gupta, Pranav Singla, Biplab Banerjee

Abstract: We address the challenges inherent in sketch-based image retrieval (SBIR) across various settings, including zero-shot SBIR, generalized zero-shot SBIR, and fine-grained zero-shot SBIR, by leveraging the vision-language foundation model CLIP. While recent endeavors have employed CLIP to enhance SBIR, these approaches predominantly follow uni-modal prompt processing and overlook to exploit CLIP's i… ▽ More We address the challenges inherent in sketch-based image retrieval (SBIR) across various settings, including zero-shot SBIR, generalized zero-shot SBIR, and fine-grained zero-shot SBIR, by leveraging the vision-language foundation model CLIP. While recent endeavors have employed CLIP to enhance SBIR, these approaches predominantly follow uni-modal prompt processing and overlook to exploit CLIP's integrated visual and textual capabilities fully. To bridge this gap, we introduce SpLIP, a novel multi-modal prompt learning scheme designed to operate effectively with frozen CLIP backbones. We diverge from existing multi-modal prompting methods that treat visual and textual prompts independently or integrate them in a limited fashion, leading to suboptimal generalization. SpLIP implements a bi-directional prompt-sharing strategy that enables mutual knowledge exchange between CLIP's visual and textual encoders, fostering a more cohesive and synergistic prompt processing mechanism that significantly reduces the semantic gap between the sketch and photo embeddings. In addition to pioneering multi-modal prompt learning, we propose two innovative strategies for further refining the embedding space. The first is an adaptive margin generation for the sketch-photo triplet loss, regulated by CLIP's class textual embeddings. The second introduces a novel task, termed conditional cross-modal jigsaw, aimed at enhancing fine-grained sketch-photo alignment by implicitly modeling sketches' viable patch arrangement using knowledge of unshuffled photos. Our comprehensive experimental evaluations across multiple benchmarks demonstrate the superior performance of SpLIP in all three SBIR scenarios. Project page: https://mainaksingha01.github.io/SpLIP/ . △ Less

Submitted 22 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

Comments: Accepted in ECCV 2024

arXiv:2407.00534 [pdf]

Blockchain based Decentralized Petition System

Authors: Jagdeep Kaur, Kevin Antony, Nikhil Pujar, Ankit Jha

Abstract: A decentralized online petition system enables individuals or groups to create, sign, and share petitions without a central authority. Using blockchain technology, these systems ensure the integrity and transparency of the petition process by recording every signature or action on the blockchain, making alterations or deletions impossible. This provides a permanent, tamper-proof record of the peti… ▽ More A decentralized online petition system enables individuals or groups to create, sign, and share petitions without a central authority. Using blockchain technology, these systems ensure the integrity and transparency of the petition process by recording every signature or action on the blockchain, making alterations or deletions impossible. This provides a permanent, tamper-proof record of the petition's progress. Such systems allow users to bypass traditional intermediaries like government or social media platforms, fostering more democratic and transparent decision-making. This paper reviews research on petition systems, highlighting the shortcomings of existing systems such as lack of accountability, vulnerability to hacking, and security issues. The proposed blockchain-based implementation aims to overcome these challenges. Decentralized voting systems have garnered interest recently due to their potential to provide secure and transparent voting platforms without intermediaries, addressing issues like voter fraud, manipulation, and trust in the electoral process. We propose a decentralized voting system web application using blockchain technology to ensure the integrity and security of the voting process. This system aims to provide a transparent, decentralized decision-making process that counts every vote while eliminating the need for centralized authorities. The paper presents an overview of the system architecture, design considerations, and implementation details, along with the potential benefits and limitations. Finally, we discuss future research directions, examining the technical aspects of the application, including underlying algorithms and protocols. Our research aims to enhance the integrity and accessibility of democratic processes, improve security, and ensure fairness, transparency, and tamper-proofness. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.18508 [pdf]

Assessment of Clonal Hematopoiesis of Indeterminate Potential from Cardiac Magnetic Resonance Imaging using Deep Learning in a Cardio-oncology Population

Authors: Sangeon Ryu, Shawn Ahn, Jeacy Espinoza, Alokkumar Jha, Stephanie Halene, James S. Duncan, Jennifer M Kwan, Nicha C. Dvornek

Abstract: Background: We propose a novel method to identify who may likely have clonal hematopoiesis of indeterminate potential (CHIP), a condition characterized by the presence of somatic mutations in hematopoietic stem cells without detectable hematologic malignancy, using deep learning techniques. Methods: We developed a convolutional neural network (CNN) to predict CHIP status using 4 different views fr… ▽ More Background: We propose a novel method to identify who may likely have clonal hematopoiesis of indeterminate potential (CHIP), a condition characterized by the presence of somatic mutations in hematopoietic stem cells without detectable hematologic malignancy, using deep learning techniques. Methods: We developed a convolutional neural network (CNN) to predict CHIP status using 4 different views from standard delayed gadolinium-enhanced cardiac magnetic resonance imaging (CMR). We used 5-fold cross validation on 82 cardio-oncology patients to assess the performance of our model. Different algorithms were compared to find the optimal patient-level prediction method using the image-level CNN predictions. Results: We found that the best model had an area under the receiver operating characteristic curve of 0.85 and an accuracy of 82%. Conclusions: We conclude that a deep learning-based diagnostic approach for CHIP using CMR is promising. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.14172 [pdf, ps, other]

Dynamics of Phase Transition in Quark-Gluon Plasma Droplet Formation under Magnetic Field

Authors: Agam K. Jha, Aviral Srivastava

Abstract: Pre-existing density of states for a Quark-Gluon Phase, based on Thomas-Fermi and Bethe mode, is expanded by incorporation of new variables. Results from recent study indicate that perturbations in the form of a finite non-zero chemical potential T, B, dynamic thermal masses M and of course Temperature T are indeed vital to fully comprehend the formation and dynamics of QGP. Simulations depict an… ▽ More Pre-existing density of states for a Quark-Gluon Phase, based on Thomas-Fermi and Bethe mode, is expanded by incorporation of new variables. Results from recent study indicate that perturbations in the form of a finite non-zero chemical potential T, B, dynamic thermal masses M and of course Temperature T are indeed vital to fully comprehend the formation and dynamics of QGP. Simulations depict an overall increase in the stability of QGP in the paradigm of the statistical model. On the top of Free Energy, Entropy and heat capacity are calculated for the phase transition. The overall qualitative behavior, of entropy or Heat Capacity determines the order of phase transition of the QGP. Investigation of order of phase transition is carried out in this study through Monte-Carlo based differential element, which ensures the inclusion of the randomness of the collisions at the particle colliders. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14153 [pdf, other]

On random classical marginal problems with applications to quantum information theory

Authors: Ankit Kumar Jha, Ion Nechita

Abstract: In this paper, we study random instances of the classical marginal problem. We encode the problem in a graph, where the vertices have assigned fixed binary probability distributions, and edges have assigned random bivariate distributions having the incident vertex distributions as marginals. We provide estimates on the probability that a joint distribution on the graph exists, having the bivariate… ▽ More In this paper, we study random instances of the classical marginal problem. We encode the problem in a graph, where the vertices have assigned fixed binary probability distributions, and edges have assigned random bivariate distributions having the incident vertex distributions as marginals. We provide estimates on the probability that a joint distribution on the graph exists, having the bivariate edge distributions as marginals. Our study is motivated by Fine's theorem in quantum mechanics. We study in great detail the graphs corresponding to CHSH and Bell-Wigner scenarios providing rations of volumes between the local and non-signaling polytopes. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.02374 [pdf]

Direct measurement of the viscocapillary lift force near a liquid interface

Authors: Hao Zhang, Zaicheng Zhang, Aditya Jha, Yacine Amarouchene, Thomas Salez, Thomas Guérin, Chaouqi Misbah, Abdelhamid Maali

Abstract: Lift force of viscous origin is widespread across disciplines, from mechanics to biology. Here, we present the first direct measurement of the lift force acting on a particle moving in a viscous fluid along the liquid interface that separates two liquids. The force arises from the coupling between the viscous flow induced by the particle motion and the capillary deformation of the interface. The m… ▽ More Lift force of viscous origin is widespread across disciplines, from mechanics to biology. Here, we present the first direct measurement of the lift force acting on a particle moving in a viscous fluid along the liquid interface that separates two liquids. The force arises from the coupling between the viscous flow induced by the particle motion and the capillary deformation of the interface. The measurements show that the lift force increases as the distance between the sphere and the interface decreases, reaching saturation at small distances. The experimental results are in good agreement with the model and numerical calculation developed within the framework of the soft lubrication theory. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01044 [pdf]

Nuclear Medicine Artificial Intelligence in Action: The Bethesda Report (AI Summit 2024)

Authors: Arman Rahmim, Tyler J. Bradshaw, Guido Davidzon, Joyita Dutta, Georges El Fakhri, Munir Ghesani, Nicolas A. Karakatsanis, Quanzheng Li, Chi Liu, Emilie Roncali, Babak Saboury, Tahir Yusufaly, Abhinav K. Jha

Abstract: The 2nd SNMMI Artificial Intelligence (AI) Summit, organized by the SNMMI AI Task Force, took place in Bethesda, MD, on February 29 - March 1, 2024. Bringing together various community members and stakeholders, and following up on a prior successful 2022 AI Summit, the summit theme was: AI in Action. Six key topics included (i) an overview of prior and ongoing efforts by the AI task force, (ii) em… ▽ More The 2nd SNMMI Artificial Intelligence (AI) Summit, organized by the SNMMI AI Task Force, took place in Bethesda, MD, on February 29 - March 1, 2024. Bringing together various community members and stakeholders, and following up on a prior successful 2022 AI Summit, the summit theme was: AI in Action. Six key topics included (i) an overview of prior and ongoing efforts by the AI task force, (ii) emerging needs and tools for computational nuclear oncology, (iii) new frontiers in large language and generative models, (iv) defining the value proposition for the use of AI in nuclear medicine, (v) open science including efforts for data and model repositories, and (vi) issues of reimbursement and funding. The primary efforts, findings, challenges, and next steps are summarized in this manuscript. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.17205 [pdf, ps, other]

An asymptotic expansion for a Lambert series associated to Siegel cusp forms of degree $n$

Authors: Babita, Abhash Kumar Jha, Bibekananda Maji, Manidipa Pal

Abstract: Utilizing inverse Mellin transform of the symmetric square $L$-function attached to Ramanujan tau function, Hafner and Stopple proved a conjecture of Zagier, which states that the constant term of the automorphic function $y^{12}|Δ(z)|^2$ i.e., the Lambert series $y^{12}\sum_{n=1}^\infty τ(n)^2 e^{-4 πn y}$ can be expressed in terms of the non-trivial zeros of the Riemann zeta function. This study… ▽ More Utilizing inverse Mellin transform of the symmetric square $L$-function attached to Ramanujan tau function, Hafner and Stopple proved a conjecture of Zagier, which states that the constant term of the automorphic function $y^{12}|Δ(z)|^2$ i.e., the Lambert series $y^{12}\sum_{n=1}^\infty τ(n)^2 e^{-4 πn y}$ can be expressed in terms of the non-trivial zeros of the Riemann zeta function. This study examines certain Lambert series associated to Siegel cusp forms of degree $n$ twisted by a character $χ$ and observes a similar phenomenon. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 15 pages, comments are welcome! arXiv admin note: text overlap with arXiv:2305.07412

MSC Class: Primary 11M06; 11M26; 11F46; Secondary 11N37

arXiv:2405.15341 [pdf, other]

V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM

Authors: Abdur Rahman, Rajat Chawla, Muskaan Kumar, Arkajit Datta, Adarsh Jha, Mukunda NS, Ishaan Bhola

Abstract: In the rapidly evolving landscape of AI research and application, Multimodal Large Language Models (MLLMs) have emerged as a transformative force, adept at interpreting and integrating information from diverse modalities such as text, images, and Graphical User Interfaces (GUIs). Despite these advancements, the nuanced interaction and understanding of GUIs pose a significant challenge, limiting th… ▽ More In the rapidly evolving landscape of AI research and application, Multimodal Large Language Models (MLLMs) have emerged as a transformative force, adept at interpreting and integrating information from diverse modalities such as text, images, and Graphical User Interfaces (GUIs). Despite these advancements, the nuanced interaction and understanding of GUIs pose a significant challenge, limiting the potential of existing models to enhance automation levels. To bridge this gap, this paper presents V-Zen, an innovative Multimodal Large Language Model (MLLM) meticulously crafted to revolutionise the domain of GUI understanding and grounding. Equipped with dual-resolution image encoders, V-Zen establishes new benchmarks in efficient grounding and next-action prediction, thereby laying the groundwork for self-operating computer systems. Complementing V-Zen is the GUIDE dataset, an extensive collection of real-world GUI elements and task-based sequences, serving as a catalyst for specialised fine-tuning. The successful integration of V-Zen and GUIDE marks the dawn of a new era in multimodal AI research, opening the door to intelligent, autonomous computing experiences. This paper extends an invitation to the research community to join this exciting journey, shaping the future of GUI automation. In the spirit of open science, our code, data, and model will be made publicly available, paving the way for multimodal dialogue scenarios with intricate and precise interactions. △ Less

Submitted 21 July, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 12 pages, 5 figures, 3 tables

arXiv:2405.13434 [pdf, other]

Observation of Brownian elastohydrodynamic forces acting on confined soft colloids

Authors: Nicolas Fares, Maxime Lavaud, Zaicheng Zhang, Aditya Jha, Yacine Amarouchene, Thomas Salez

Abstract: Confined motions in complex environments are ubiquitous in microbiology. These situations invariably involve the intricate coupling between fluid flow, soft boundaries, surface forces and fluctuations. In the present study, such a coupling is investigated using a novel method combining holographic microscopy and advanced statistical inference. Specifically, the Brownian motion of softmicrometric o… ▽ More Confined motions in complex environments are ubiquitous in microbiology. These situations invariably involve the intricate coupling between fluid flow, soft boundaries, surface forces and fluctuations. In the present study, such a coupling is investigated using a novel method combining holographic microscopy and advanced statistical inference. Specifically, the Brownian motion of softmicrometric oil droplets near rigid walls is quantitatively analyzed. All the key statistical observables are reconstructed with high precision, allowing for nanoscale resolution of local mobilities and femtonewton inference of conservative or non-conservative forces. Strikingly, the analysis reveals the existence of a novel, transient, but large, soft Brownian force. The latter might be of crucial importance for microbiological and nanophysical transport, target finding or chemical reactions in crowded environments, and hence the whole life machinery. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13114 [pdf, other]

Probing CP Violation and Mass Hierarchy in Neutrino Oscillations in Matter through Quantum Speed Limits

Authors: Subhadip Bouri, Abhishek Kumar Jha, Subhashish Banerjee

Abstract: The quantum speed limits (QSLs) set fundamental lower bounds on the time required for a quantum system to evolve from a given initial state to a final state. In this work, we investigate CP violation and the mass hierarchy problem of neutrino oscillations in matter using the QSL time as a key analytical tool. We examine the QSL time for the unitary evolution of two- and three-flavor neutrino state… ▽ More The quantum speed limits (QSLs) set fundamental lower bounds on the time required for a quantum system to evolve from a given initial state to a final state. In this work, we investigate CP violation and the mass hierarchy problem of neutrino oscillations in matter using the QSL time as a key analytical tool. We examine the QSL time for the unitary evolution of two- and three-flavor neutrino states, both in vacuum and in the presence of matter. Two-flavor neutrino oscillations are used as a precursor to their three-flavor counterparts. We further compute the QSL time for neutrino state evolution and entanglement in terms of neutrino survival and oscillation probabilities, which are experimentally measurable quantities in neutrino experiments. A difference in the QSL time between the normal and inverted mass hierarchy scenarios, for neutrino state evolution as well as for entanglement, under the effect of a CP violation phase is observed. Our results are illustrated using energy-varying sets of accelerator neutrino sources from experiments such as T2K, NOvA, and DUNE. Notably, three-flavor neutrino oscillations in constant matter density exhibit faster state evolution across all these neutrino experiments in the normal mass hierarchy scenario. Additionally, we observe fast entanglement growth in DUNE assuming a normal mass hierarchy. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: v1: 18 pages, 10 figures. Comments welcome

arXiv:2405.12988 [pdf, other]

Prediction of Cryptocurrency Prices through a Path Dependent Monte Carlo Simulation

Authors: Ayush Singh, Anshu K. Jha, Amit N. Kumar

Abstract: In this paper, our focus lies on the Merton's jump diffusion model, employing jump processes characterized by the compound Poisson process. Our primary objective is to forecast the drift and volatility of the model using a variety of methodologies. We adopt an approach that involves implementing different drift, volatility, and jump terms within the model through various machine learning technique… ▽ More In this paper, our focus lies on the Merton's jump diffusion model, employing jump processes characterized by the compound Poisson process. Our primary objective is to forecast the drift and volatility of the model using a variety of methodologies. We adopt an approach that involves implementing different drift, volatility, and jump terms within the model through various machine learning techniques, traditional methods, and statistical methods on price-volume data. Additionally, we introduce a path-dependent Monte Carlo simulation to model cryptocurrency prices, taking into account the volatility and unexpected jumps in prices. △ Less

Submitted 10 April, 2024; originally announced May 2024.

Comments: 21 pages

arXiv:2405.10804 [pdf, other]

A wavefront rotator with near-zero mean polarization change

Authors: Suman Karan, Nilakshi Senapati, Anand K. Jha

Abstract: A K-mirror is a device that rotates the wavefront of an incident optical field. It has recently gained prominence over Dove prism, another commonly used wavefront rotator, due to the fact that while a K-mirror has several controls for adjusting the internal reflections, a Dove prism is made of a single glass element with no additional control. Thus, one can obtain much lower angular deviations of… ▽ More A K-mirror is a device that rotates the wavefront of an incident optical field. It has recently gained prominence over Dove prism, another commonly used wavefront rotator, due to the fact that while a K-mirror has several controls for adjusting the internal reflections, a Dove prism is made of a single glass element with no additional control. Thus, one can obtain much lower angular deviations of transmitting wavefronts using a K-mirror than with a Dove prism. However, the accompanying polarization changes in the transmitted field due to rotation persist even in the commercially available K-mirrors. A recent theoretical work [Applied Optics, 61, 8302 (2022)] shows that it is possible to optimize the base angle of a K-mirror for a given refractive index such that the accompanying polarization changes are minimum. In contrast, we show in this article that by optimizing the refractive index it is possible to design a K-mirror at any given base angle and with any given value for the mean polarization change, including near-zero values. Furthermore, we experimentally demonstrate a K-mirror with an order-of-magnitude lower mean polarization change than that of the commercially available K-mirrors. This can have important practical implications for OAM-based applications that require precise wavefront rotation control. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: Manuscript: 9 pages, 9 figures

arXiv:2404.16048 [pdf, other]

GUIDE: Graphical User Interface Data for Execution

Authors: Rajat Chawla, Adarsh Jha, Muskaan Kumar, Mukunda NS, Ishaan Bhola

Abstract: In this paper, we introduce GUIDE, a novel dataset tailored for the advancement of Multimodal Large Language Model (MLLM) applications, particularly focusing on Robotic Process Automation (RPA) use cases. Our dataset encompasses diverse data from various websites including Apollo(62.67\%), Gmail(3.43\%), Calendar(10.98\%) and Canva(22.92\%). Each data entry includes an image, a task description, t… ▽ More In this paper, we introduce GUIDE, a novel dataset tailored for the advancement of Multimodal Large Language Model (MLLM) applications, particularly focusing on Robotic Process Automation (RPA) use cases. Our dataset encompasses diverse data from various websites including Apollo(62.67\%), Gmail(3.43\%), Calendar(10.98\%) and Canva(22.92\%). Each data entry includes an image, a task description, the last action taken, CoT and the next action to be performed along with grounding information of where the action needs to be executed. The data is collected using our in-house advanced annotation tool NEXTAG (Next Action Grounding and Annotation Tool). The data is adapted for multiple OS, browsers and display types. It is collected by multiple annotators to capture the variation of design and the way person uses a website. Through this dataset, we aim to facilitate research and development in the realm of LLMs for graphical user interfaces, particularly in tasks related to RPA. The dataset's multi-platform nature and coverage of diverse websites enable the exploration of cross-interface capabilities in automation tasks. We believe that our dataset will serve as a valuable resource for advancing the capabilities of multi-platform LLMs in practical applications, fostering innovation in the field of automation and natural language understanding. Using GUIDE, we build V-Zen, the first RPA model to automate multiple websites using our in-House Automation tool AUTONODE △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 11 pages, 8 figures, 3 Tables and 1 Algorithm

arXiv:2404.13693 [pdf, other]

PV-S3: Advancing Automatic Photovoltaic Defect Detection using Semi-Supervised Semantic Segmentation of Electroluminescence Images

Authors: Abhishek Jha, Yogesh Rawat, Shruti Vyas

Abstract: Photovoltaic (PV) systems allow us to tap into all abundant solar energy, however they require regular maintenance for high efficiency and to prevent degradation. Traditional manual health check, using Electroluminescence (EL) imaging, is expensive and logistically challenging which makes automated defect detection essential. Current automation approaches require extensive manual expert labeling,… ▽ More Photovoltaic (PV) systems allow us to tap into all abundant solar energy, however they require regular maintenance for high efficiency and to prevent degradation. Traditional manual health check, using Electroluminescence (EL) imaging, is expensive and logistically challenging which makes automated defect detection essential. Current automation approaches require extensive manual expert labeling, which is time-consuming, expensive, and prone to errors. We propose PV-S3 (Photovoltaic-Semi Supervised Segmentation), a Semi-Supervised Learning approach for semantic segmentation of defects in EL images that reduces reliance on extensive labeling. PV-S3 is a Deep learning model trained using a few labeled images along with numerous unlabeled images. We evaluate PV-S3 on multiple datasets and demonstrate its effectiveness and adaptability. With merely 20% labeled samples, we achieve an absolute improvement of 9.7% in IoU, 13.5% in Precision, 29.15% in Recall, and 20.42% in F1-Score over prior state-of-the-art supervised method (which uses 100% labeled samples) on UCF-EL dataset (largest dataset available for semantic segmentation of EL images)showing improvement in performance while reducing the annotation costs by 80%. △ Less

Submitted 17 July, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.05366 [pdf, other]

CDAD-Net: Bridging Domain Gaps in Generalized Category Discovery

Authors: Sai Bhargav Rongali, Sarthak Mehrotra, Ankit Jha, Mohamad Hassan N C, Shirsha Bose, Tanisha Gupta, Mainak Singha, Biplab Banerjee

Abstract: In Generalized Category Discovery (GCD), we cluster unlabeled samples of known and novel classes, leveraging a training dataset of known classes. A salient challenge arises due to domain shifts between these datasets. To address this, we present a novel setting: Across Domain Generalized Category Discovery (AD-GCD) and bring forth CDAD-NET (Class Discoverer Across Domains) as a remedy. CDAD-NET is… ▽ More In Generalized Category Discovery (GCD), we cluster unlabeled samples of known and novel classes, leveraging a training dataset of known classes. A salient challenge arises due to domain shifts between these datasets. To address this, we present a novel setting: Across Domain Generalized Category Discovery (AD-GCD) and bring forth CDAD-NET (Class Discoverer Across Domains) as a remedy. CDAD-NET is architected to synchronize potential known class samples across both the labeled (source) and unlabeled (target) datasets, while emphasizing the distinct categorization of the target data. To facilitate this, we propose an entropy-driven adversarial learning strategy that accounts for the distance distributions of target samples relative to source-domain class prototypes. Parallelly, the discriminative nature of the shared space is upheld through a fusion of three metric learning objectives. In the source domain, our focus is on refining the proximity between samples and their affiliated class prototypes, while in the target domain, we integrate a neighborhood-centric contrastive learning mechanism, enriched with an adept neighborsmining approach. To further accentuate the nuanced feature interrelation among semantically aligned images, we champion the concept of conditional image inpainting, underscoring the premise that semantically analogous images prove more efficacious to the task than their disjointed counterparts. Experimentally, CDAD-NET eclipses existing literature with a performance increment of 8-15% on three AD-GCD benchmarks we present. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted in L3D-IVU, CVPR Workshop, 2024

arXiv:2404.02804 [pdf, other]

Residual-Based a Posteriori Error Estimators for Algebraic Stabilizations

Authors: Abhinav Jha

Abstract: In this note, we extend the analysis for the residual-based a posteriori error estimators in the energy norm defined for the algebraic flux correction (AFC) schemes [Jha20.CAMWA] to the newly proposed algebraic stabilization schemes [JK21.NM, Kn23.NA]. Numerical simulations on adaptively refined grids are performed in two dimensions showing the higher efficiency of an algebraic stabilization with… ▽ More In this note, we extend the analysis for the residual-based a posteriori error estimators in the energy norm defined for the algebraic flux correction (AFC) schemes [Jha20.CAMWA] to the newly proposed algebraic stabilization schemes [JK21.NM, Kn23.NA]. Numerical simulations on adaptively refined grids are performed in two dimensions showing the higher efficiency of an algebraic stabilization with similar accuracy compared with an AFC scheme. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2404.00710 [pdf, other]

Unknown Prompt, the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization

Authors: Mainak Singha, Ankit Jha, Shirsha Bose, Ashwin Nair, Moloud Abdar, Biplab Banerjee

Abstract: We delve into Open Domain Generalization (ODG), marked by domain and category shifts between training's labeled source and testing's unlabeled target domains. Existing solutions to ODG face limitations due to constrained generalizations of traditional CNN backbones and errors in detecting target open samples in the absence of prior knowledge. Addressing these pitfalls, we introduce ODG-CLIP, harne… ▽ More We delve into Open Domain Generalization (ODG), marked by domain and category shifts between training's labeled source and testing's unlabeled target domains. Existing solutions to ODG face limitations due to constrained generalizations of traditional CNN backbones and errors in detecting target open samples in the absence of prior knowledge. Addressing these pitfalls, we introduce ODG-CLIP, harnessing the semantic prowess of the vision-language model, CLIP. Our framework brings forth three primary innovations: Firstly, distinct from prevailing paradigms, we conceptualize ODG as a multi-class classification challenge encompassing both known and novel categories. Central to our approach is modeling a unique prompt tailored for detecting unknown class samples, and to train this, we employ a readily accessible stable diffusion model, elegantly generating proxy images for the open class. Secondly, aiming for domain-tailored classification (prompt) weights while ensuring a balance of precision and simplicity, we devise a novel visual stylecentric prompt learning mechanism. Finally, we infuse images with class-discriminative knowledge derived from the prompt space to augment the fidelity of CLIP's visual embeddings. We introduce a novel objective to safeguard the continuity of this infused semantic intel across domains, especially for the shared classes. Through rigorous testing on diverse datasets, covering closed and open-set DG contexts, ODG-CLIP demonstrates clear supremacy, consistently outpacing peers with performance boosts between 8%-16%. Code will be available at https://github.com/mainaksingha01/ODG-CLIP. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: Accepted in CVPR 2024

arXiv:2403.19241 [pdf, other]

Capillary-lubrication force between rotating cylinders separated by a fluid interface

Authors: Aditya Jha, Yacine Amarouchene, Thomas Salez

Abstract: Two cylinders rotating next to each other generate a large hydrodynamic force if the intermediate space is filled with a viscous fluid. Herein, we explore the case where the cylinders are separated by two layers of viscous immiscible fluids, in the limit of small capillary deformation of the fluid interface. As the interface deformation breaks the system's symmetry, a novel force characteristic of… ▽ More Two cylinders rotating next to each other generate a large hydrodynamic force if the intermediate space is filled with a viscous fluid. Herein, we explore the case where the cylinders are separated by two layers of viscous immiscible fluids, in the limit of small capillary deformation of the fluid interface. As the interface deformation breaks the system's symmetry, a novel force characteristic of soft lubrication is generated. We calculate this capillary-lubrication force, which is split into velocity-dependant and acceleration-dependant contributions. Furthermore, we analyze the variations induced by modifying the viscosity ratio between the two fluid layers, their thickness ratio, and the Bond number. Unlike standard elastic cases, where a repelling soft-lubrication lift force has been abundantly reported, the current fluid bilayer setting can also exhibit an attractive force due to the non-monotonic deflection of the fluid interface when varying the sublayer thickness. Besides, at high Bond numbers, the system's response becomes analogous to the one of a Winkler-like substrate with a viscous flow inside. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2307.00013

arXiv:2403.17764 [pdf, other]

Can patient-specific acquisition protocol improve performance on defect detection task in myocardial perfusion SPECT?

Authors: Nu Ri Choi, Md Ashequr Rahman, Zitong Yu, Barry A. Siegel, Abhinav K. Jha

Abstract: Myocardial perfusion imaging using single-photon emission computed tomography (SPECT), or myocardial perfusion SPECT (MPS) is a widely used clinical imaging modality for the diagnosis of coronary artery disease. Current clinical protocols for acquiring and reconstructing MPS images are similar for most patients. However, for patients with outlier anatomical characteristics, such as large breasts,… ▽ More Myocardial perfusion imaging using single-photon emission computed tomography (SPECT), or myocardial perfusion SPECT (MPS) is a widely used clinical imaging modality for the diagnosis of coronary artery disease. Current clinical protocols for acquiring and reconstructing MPS images are similar for most patients. However, for patients with outlier anatomical characteristics, such as large breasts, images acquired using conventional protocols are often sub-optimal in quality, leading to degraded diagnostic accuracy. Solutions to improve image quality for these patients outside of increased dose or total acquisition time remain challenging. Thus, there is an important need for new methodologies to improve image quality for such patients. One approach to improving this performance is adapting the image acquisition protocol specific to each patient. For this study, we first designed and implemented a personalized patient-specific protocol-optimization strategy, which we term precision SPECT (PRESPECT). This strategy integrates ideal observer theory with the constraints of tomographic reconstruction to optimize the acquisition time for each projection view, such that MPS defect detection performance is maximized. We performed a clinically realistic simulation study on patients with outlier anatomies on the task of detecting perfusion defects on various realizations of low-dose scans by an anthropomorphic channelized Hotelling observer. Our results show that using PRESPECT led to improved performance on the defect detection task for the considered patients. These results provide evidence that personalization of MPS acquisition protocol has the potential to improve defect detection performance, motivating further research to design optimal patient-specific acquisition and reconstruction protocols for MPS, as well as developing similar approaches for other medical imaging modalities. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: To be published in the Proceedings of SPIE, Medical Imaging 2024

arXiv:2403.17226 [pdf, other]

WIN-PDQ: A Wiener-estimator-based projection-domain quantitative SPECT method that accounts for intra-regional uptake heterogeneity

Authors: Zekun Li, Nadia Benabdallah, Daniel L. J. Thorek, Abhinav K. Jha

Abstract: SPECT can enable the quantification of activity uptake in lesions and at-risk organs in α-particle-emitting radiopharmaceutical therapies (α-RPTs). But this quantification is challenged by the low photon counts, complicated isotope physics, and the image-degrading effects in α-RPT SPECT. Thus, strategies to optimize the SPECT system and protocol designs for the task of regional uptake quantificati… ▽ More SPECT can enable the quantification of activity uptake in lesions and at-risk organs in α-particle-emitting radiopharmaceutical therapies (α-RPTs). But this quantification is challenged by the low photon counts, complicated isotope physics, and the image-degrading effects in α-RPT SPECT. Thus, strategies to optimize the SPECT system and protocol designs for the task of regional uptake quantification are needed. Objectively performing this task-based optimization requires a reliable (accurate and precise) regional uptake quantification method. Conventional reconstruction-based quantification (RBQ) methods have been observed to be erroneous for α-RPT SPECT. Projection-domain quantification methods, which estimate regional uptake directly from SPECT projections, have demonstrated potential in providing reliable regional uptake estimates, but these methods assume constant uptake within the regions, an assumption that may not hold. To address these challenges, we propose WIN-PDQ, a Wiener-estimator-based projection-domain quantitative SPECT method. The method accounts for the heterogeneity within the regions of interest while estimating mean uptake. An early-stage evaluation of the method was conducted using 3D Monte Carlo-simulated SPECT of anthropomorphic phantoms with radium-223 uptake and lumpy-model-based intra-regional uptake heterogeneity. In this evaluation with phantoms of varying mean regional uptake and intra-regional uptake heterogeneity, the WIN-PDQ method yielded ensemble unbiased estimates and significantly outperformed both reconstruction-based and previously proposed projection-domain quantification methods. In conclusion, based on these preliminary findings, the proposed method is showing potential for estimating mean regional uptake in α-RPTs and towards enabling the objective task-based optimization of SPECT system and protocol designs. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: The work has been accepted for publication in 2024 SPIE Medical Imaging conference proceedings

arXiv:2403.16873 [pdf, other]

How accurately can quantitative imaging methods be ranked without ground truth: An upper bound on no-gold-standard evaluation

Authors: Yan Liu, Abhinav K. Jha

Abstract: Objective evaluation of quantitative imaging (QI) methods with patient data, while important, is typically hindered by the lack of gold standards. To address this challenge, no-gold-standard evaluation (NGSE) techniques have been proposed. These techniques have demonstrated efficacy in accurately ranking QI methods without access to gold standards. The development of NGSE methods has raised an imp… ▽ More Objective evaluation of quantitative imaging (QI) methods with patient data, while important, is typically hindered by the lack of gold standards. To address this challenge, no-gold-standard evaluation (NGSE) techniques have been proposed. These techniques have demonstrated efficacy in accurately ranking QI methods without access to gold standards. The development of NGSE methods has raised an important question: how accurately can QI methods be ranked without ground truth. To answer this question, we propose a Cramer-Rao bound (CRB)-based framework that quantifies the upper bound in ranking QI methods without any ground truth. We present the application of this framework in guiding the use of a well-known NGSE technique, namely the regression-without-truth (RWT) technique. Our results show the utility of this framework in quantifying the performance of this NGSE technique for different patient numbers. These results provide motivation towards studying other applications of this upper bound. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.08773 [pdf, other]

Veagle: Advancements in Multimodal Representation Learning

Authors: Rajat Chawla, Arkajit Datta, Tushar Verma, Adarsh Jha, Anmol Gautam, Ayush Vatsal, Sukrit Chaterjee, Mukunda NS, Ishaan Bhola

Abstract: Lately, researchers in artificial intelligence have been really interested in how language and vision come together, giving rise to the development of multimodal models that aim to seamlessly integrate textual and visual information. Multimodal models, an extension of Large Language Models (LLMs), have exhibited remarkable capabilities in addressing a diverse array of tasks, ranging from image cap… ▽ More Lately, researchers in artificial intelligence have been really interested in how language and vision come together, giving rise to the development of multimodal models that aim to seamlessly integrate textual and visual information. Multimodal models, an extension of Large Language Models (LLMs), have exhibited remarkable capabilities in addressing a diverse array of tasks, ranging from image captioning and visual question answering (VQA) to visual grounding. While these models have showcased significant advancements, challenges persist in accurately interpreting images and answering the question, a common occurrence in real-world scenarios. This paper introduces a novel approach to enhance the multimodal capabilities of existing models. In response to the limitations observed in current Vision Language Models (VLMs) and Multimodal Large Language Models (MLLMs), our proposed model Veagle, incorporates a unique mechanism inspired by the successes and insights of previous works. Veagle leverages a dynamic mechanism to project encoded visual information directly into the language model. This dynamic approach allows for a more nuanced understanding of intricate details present in visual contexts. To validate the effectiveness of Veagle, we conduct comprehensive experiments on benchmark datasets, emphasizing tasks such as visual question answering and image understanding. Our results indicate a improvement of 5-6 \% in performance, with Veagle outperforming existing models by a notable margin. The outcomes underscore the model's versatility and applicability beyond traditional benchmarks. △ Less

Submitted 18 January, 2024; originally announced March 2024.

arXiv:2403.00788 [pdf]

PRECISE Framework: GPT-based Text For Improved Readability, Reliability, and Understandability of Radiology Reports For Patient-Centered Care

Authors: Satvik Tripathi, Liam Mutter, Meghana Muppuri, Suhani Dheer, Emiliano Garza-Frias, Komal Awan, Aakash Jha, Michael Dezube, Azadeh Tabari, Christopher P. Bridge, Dania Daye

Abstract: This study introduces and evaluates the PRECISE framework, utilizing OpenAI's GPT-4 to enhance patient engagement by providing clearer and more accessible chest X-ray reports at a sixth-grade reading level. The framework was tested on 500 reports, demonstrating significant improvements in readability, reliability, and understandability. Statistical analyses confirmed the effectiveness of the PRECI… ▽ More This study introduces and evaluates the PRECISE framework, utilizing OpenAI's GPT-4 to enhance patient engagement by providing clearer and more accessible chest X-ray reports at a sixth-grade reading level. The framework was tested on 500 reports, demonstrating significant improvements in readability, reliability, and understandability. Statistical analyses confirmed the effectiveness of the PRECISE approach, highlighting its potential to foster patient-centric care delivery in healthcare decision-making. △ Less

Submitted 19 February, 2024; originally announced March 2024.

arXiv:2403.00090 [pdf, other]

Eight-shot measurement of spatially non-stationary complex coherence function

Authors: Pranay Mohta, Abhinandan Bhattacharjee, Anand K. Jha

Abstract: Spatial coherence plays an important role in several real-world applications ranging from imaging to communication. As a result, its accurate characterization and measurement are extremely crucial for its optimal application. However, efficient measurement of an arbitrary complex spatial coherence function is still very challenging. In this letter, we propose an efficient, noise-insensitive interf… ▽ More Spatial coherence plays an important role in several real-world applications ranging from imaging to communication. As a result, its accurate characterization and measurement are extremely crucial for its optimal application. However, efficient measurement of an arbitrary complex spatial coherence function is still very challenging. In this letter, we propose an efficient, noise-insensitive interferometric technique that combines wavefront shearing and inversion for measuring the complex cross-spectral density function of the class of fields, in which the cross-spectral density function depends either on the difference of the spatial coordinates, or the squares of spatial coordinates, or both. This class of fields are most commonly encountered, and we experimentally demonstrate high-fidelity measurement of many stationary and non-stationary fields. △ Less

Submitted 29 February, 2024; originally announced March 2024.

arXiv:2402.14957 [pdf, other]

The Common Stability Mechanism behind most Self-Supervised Learning Approaches

Authors: Abhishek Jha, Matthew B. Blaschko, Yuki M. Asano, Tinne Tuytelaars

Abstract: Last couple of years have witnessed a tremendous progress in self-supervised learning (SSL), the success of which can be attributed to the introduction of useful inductive biases in the learning process to learn meaningful visual representations while avoiding collapse. These inductive biases and constraints manifest themselves in the form of different optimization formulations in the SSL techniqu… ▽ More Last couple of years have witnessed a tremendous progress in self-supervised learning (SSL), the success of which can be attributed to the introduction of useful inductive biases in the learning process to learn meaningful visual representations while avoiding collapse. These inductive biases and constraints manifest themselves in the form of different optimization formulations in the SSL techniques, e.g. by utilizing negative examples in a contrastive formulation, or exponential moving average and predictor in BYOL and SimSiam. In this paper, we provide a framework to explain the stability mechanism of these different SSL techniques: i) we discuss the working mechanism of contrastive techniques like SimCLR, non-contrastive techniques like BYOL, SWAV, SimSiam, Barlow Twins, and DINO; ii) we provide an argument that despite different formulations these methods implicitly optimize a similar objective function, i.e. minimizing the magnitude of the expected representation over all data samples, or the mean of the data distribution, while maximizing the magnitude of the expected representation of individual samples over different data augmentations; iii) we provide mathematical and empirical evidence to support our framework. We formulate different hypotheses and test them using the Imagenet100 dataset. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: Additional visualizations (.gif): https://github.com/abskjha/CenterVectorSSL

arXiv:2402.08697 [pdf, other]

Weakly Supervised Detection of Pheochromocytomas and Paragangliomas in CT

Authors: David C. Oluigboa, Bikash Santra, Tejas Sudharshan Mathai, Pritam Mukherjee, Jianfei Liu, Abhishek Jha, Mayank Patel, Karel Pacak, Ronald M. Summers

Abstract: Pheochromocytomas and Paragangliomas (PPGLs) are rare adrenal and extra-adrenal tumors which have the potential to metastasize. For the management of patients with PPGLs, CT is the preferred modality of choice for precise localization and estimation of their progression. However, due to the myriad variations in size, morphology, and appearance of the tumors in different anatomical regions, radiolo… ▽ More Pheochromocytomas and Paragangliomas (PPGLs) are rare adrenal and extra-adrenal tumors which have the potential to metastasize. For the management of patients with PPGLs, CT is the preferred modality of choice for precise localization and estimation of their progression. However, due to the myriad variations in size, morphology, and appearance of the tumors in different anatomical regions, radiologists are posed with the challenge of accurate detection of PPGLs. Since clinicians also need to routinely measure their size and track their changes over time across patient visits, manual demarcation of PPGLs is quite a time-consuming and cumbersome process. To ameliorate the manual effort spent for this task, we propose an automated method to detect PPGLs in CT studies via a proxy segmentation task. As only weak annotations for PPGLs in the form of prospectively marked 2D bounding boxes on an axial slice were available, we extended these 2D boxes into weak 3D annotations and trained a 3D full-resolution nnUNet model to directly segment PPGLs. We evaluated our approach on a dataset consisting of chest-abdomen-pelvis CTs of 255 patients with confirmed PPGLs. We obtained a precision of 70% and sensitivity of 64.1% with our proposed approach when tested on 53 CT studies. Our findings highlight the promising nature of detecting PPGLs via segmentation, and furthers the state-of-the-art in this exciting yet challenging area of rare cancer management. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: Accepted at SPIE 2024. arXiv admin note: text overlap with arXiv:2402.00175

arXiv:2402.00838 [pdf, other]

OLMo: Accelerating the Science of Language Models

Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs. To this end, we have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models. Unlike most prior efforts that have only released model weights and inference code, we release OLMo alongside open training data and training and evaluation code. We hope this release will empower the open research community and inspire a new wave of innovation. △ Less

Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2402.00159 [pdf, other]

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Authors: Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen , et al. (11 additional authors not shown)

Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training dat… ▽ More Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training data impacts model capabilities and limitations. To facilitate scientific research on language model pretraining, we curate and release Dolma, a three-trillion-token English corpus, built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials. We extensively document Dolma, including its design principles, details about its construction, and a summary of its contents. We present analyses and experimental results on intermediate states of Dolma to share what we have learned about important data curation practices. Finally, we open-source our data curation toolkit to enable reproduction of our work as well as support further research in large-scale data curation. △ Less

Submitted 6 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: Accepted at ACL 2024; Dataset: https://hf.co/datasets/allenai/dolma; Code: https://github.com/allenai/dolma

arXiv:2401.16344 [pdf, ps, other]

A $\mathrm{L}^2$-maximum principle for circular arcs on the disk

Authors: Thiago Carvalho Corso, Muhammad Hassan, Abhinav Jha, Benjamin Stamm

Abstract: In this article, we prove a novel $\mathrm{L}^2$-maximum principle for harmonic functions on the disk with respect to circular arcs. More precisely, we prove that for any harmonic function $u$ on a disk $Ω$ with non-tangential maximal function in $\mathrm{L}^2(\partial Ω)$, the supremum of $\lVert u \rVert_{\mathrm{L}^2 (Γ)}$ over circular arcs $Γ\subset \overlineΩ$ is attained at the boundary… ▽ More In this article, we prove a novel $\mathrm{L}^2$-maximum principle for harmonic functions on the disk with respect to circular arcs. More precisely, we prove that for any harmonic function $u$ on a disk $Ω$ with non-tangential maximal function in $\mathrm{L}^2(\partial Ω)$, the supremum of $\lVert u \rVert_{\mathrm{L}^2 (Γ)}$ over circular arcs $Γ\subset \overlineΩ$ is attained at the boundary $Γ= \partial Ω$. We achieve this through a sharp geometry-dependent estimate on the norm $\lVert u \rVert_{\mathrm{L}^2(Γ)}$ in the special case where $Γ$ is a circular arc intersecting the boundary of $Ω$ in exactly two points and the boundary data $u\rvert_{\partial Ω}$ is supported along one of the connected components of $\partial Ω\setminus \overlineΓ$. As a corollary of this result, we also deduce new $\mathrm{L}^p$ maximum principles with $p \in [2,\infty)$ for circular arcs on the disk. These results have applications in the convergence analysis of Schwarz domain decomposition methods on the union of overlapping disks. We have discovered a critical error in the proof of Lemma 3.9 (highlighted in red in the paper), and therefore, the proof of Theorem 1.2 presented here is only valid under the restriction $π/2 \leq θ+σ\leq 3π/2$, where $θ,σ$ are the angles described in Section 2. In particular, the proofs of Corollaries 1.3--1.5 are incomplete. △ Less

Submitted 18 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

MSC Class: 30E25; 35J05; 35J57

arXiv:2401.13156 [pdf, other]

Local Hamiltonian decomposition and classical simulation of parametrized quantum circuits

Authors: Bibhas Adhikari, Aryan Jha

Abstract: In this paper we develop a classical algorithm of complexity $O(K \, 2^n)$ to simulate parametrized quantum circuits (PQCs) of $n$ qubits, where $K$ is the total number of one-qubit and two-qubit control gates. The algorithm is developed by finding $2$-sparse unitary matrices of order $2^n$ explicitly corresponding to any single-qubit and two-qubit control gates in an $n$-qubit system. Finally, we… ▽ More In this paper we develop a classical algorithm of complexity $O(K \, 2^n)$ to simulate parametrized quantum circuits (PQCs) of $n$ qubits, where $K$ is the total number of one-qubit and two-qubit control gates. The algorithm is developed by finding $2$-sparse unitary matrices of order $2^n$ explicitly corresponding to any single-qubit and two-qubit control gates in an $n$-qubit system. Finally, we determine analytical expression of Hamiltonians for any such gate and consequently a local Hamiltonian decomposition of any PQC is obtained. All results are validated with numerical simulations. △ Less

Submitted 31 January, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.06310 [pdf, other]

ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation

Authors: Akshita Jha, Vinodkumar Prabhakaran, Remi Denton, Sarah Laszlo, Shachi Dave, Rida Qadri, Chandan K. Reddy, Sunipa Dev

Abstract: Recent studies have shown that Text-to-Image (T2I) model generations can reflect social stereotypes present in the real world. However, existing approaches for evaluating stereotypes have a noticeable lack of coverage of global identity groups and their associated stereotypes. To address this gap, we introduce the ViSAGe (Visual Stereotypes Around the Globe) dataset to enable the evaluation of kno… ▽ More Recent studies have shown that Text-to-Image (T2I) model generations can reflect social stereotypes present in the real world. However, existing approaches for evaluating stereotypes have a noticeable lack of coverage of global identity groups and their associated stereotypes. To address this gap, we introduce the ViSAGe (Visual Stereotypes Around the Globe) dataset to enable the evaluation of known nationality-based stereotypes in T2I models, across 135 nationalities. We enrich an existing textual stereotype resource by distinguishing between stereotypical associations that are more likely to have visual depictions, such as `sombrero', from those that are less visually concrete, such as 'attractive'. We demonstrate ViSAGe's utility through a multi-faceted evaluation of T2I generations. First, we show that stereotypical attributes in ViSAGe are thrice as likely to be present in generated images of corresponding identities as compared to other attributes, and that the offensiveness of these depictions is especially higher for identities from Africa, South America, and South East Asia. Second, we assess the stereotypical pull of visual depictions of identity groups, which reveals how the 'default' representations of all identity groups in ViSAGe have a pull towards stereotypical depictions, and that this pull is even more prominent for identity groups from the Global South. CONTENT WARNING: Some examples contain offensive stereotypes. △ Less

Submitted 14 July, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: Association for Computational Linguistics (ACL) 2024

arXiv:2401.03964 [pdf, other]

Well-balanced convex limiting for finite element discretizations of steady convection-diffusion-reaction equations

Authors: Petr Knobloch, Dmitri Kuzmin, Abhinav Jha

Abstract: We address the numerical treatment of source terms in algebraic flux correction schemes for steady convection-diffusion-reaction (CDR) equations. The proposed algorithm constrains a continuous piecewise-linear finite element approximation using a monolithic convex limiting (MCL) strategy. Failure to discretize the convective derivatives and source terms in a compatible manner produces spurious rip… ▽ More We address the numerical treatment of source terms in algebraic flux correction schemes for steady convection-diffusion-reaction (CDR) equations. The proposed algorithm constrains a continuous piecewise-linear finite element approximation using a monolithic convex limiting (MCL) strategy. Failure to discretize the convective derivatives and source terms in a compatible manner produces spurious ripples, e.g., in regions where the coefficients of the continuous problem are constant and the exact solution is linear. We cure this deficiency by incorporating source term components into the fluxes and intermediate states of the MCL procedure. The design of our new limiter is motivated by the desire to preserve simple steady-state equilibria exactly, as in well-balanced schemes for the shallow water equations. The results of our numerical experiments for two-dimensional CDR problems illustrate potential benefits of well-balanced flux limiting in the scalar case. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.12602 [pdf]

Magnetism of noncolinear amorphous DyCo3 and TbCo3 thin films

Authors: Zexiang Hu, Ajay Jha, Katarzyna Siewierska, Ross Smith, Karsten Rode, Plamen Stamenov, J. M. D. Coey

Abstract: The magnetization of amorphous DyCo3 and TbCo3 is studied by magnetometry, anomalous Hall effect and magneto-optic Kerr effect to understand the temperature-dependent magnetic structure. A square magnetic hysteresis loop with perpendicular magnetic anisotropy and coercivity that reaches 3.5 T in the vicinity of the compensation temperature is seen in thin films. An anhysteretic soft component, see… ▽ More The magnetization of amorphous DyCo3 and TbCo3 is studied by magnetometry, anomalous Hall effect and magneto-optic Kerr effect to understand the temperature-dependent magnetic structure. A square magnetic hysteresis loop with perpendicular magnetic anisotropy and coercivity that reaches 3.5 T in the vicinity of the compensation temperature is seen in thin films. An anhysteretic soft component, seen in the magnetization of some films but not in their Hall or Kerr loops is an artefact due to sputter-deposition on the sides of the substrate. The temperature-dependence of the net rare earth moment from 4-300K is deduced, using the cobalt moment in amorphous YxCo1-x. The single-ion anisotropy of the quadrupole moments of the 4f atoms in the randomly-oriented local electrostatic field gradient overcomes their exchange coupling to the cobalt subnetwork, resulting in a sperimagnetic ground state where spins of the noncollinear rare-earth subnetwork are modelled by a distribution of rare earth moments within a cone whose axis is antiparallel to the ferromagnetic axis z of the cobalt subnetwork. The reduced magnetization (Jz)/J at T=0 is calculated from an atomic Hamiltonian as a function of the ratio of anisotropy to exchange energy per rare-earth atom for a range of angles between the local anisotropy axis and -z and then averaged over all directions in a hemisphere. The experimental and calculated values of (J-z)/J are close to 0.7 at low temperature for both Dy and Tb. On increasing temperature, the magnitude of the rare earth moment and the local random anisotropy that creates the cone are reduced; the cone closes and the structure approaches collinear ferrimagnetism well above ambient temperature. An asymmetric spin flop of the exchange-coupled subnetworks appears in the vicinity of the magnetization compensation temperatures of 175K for amorphous Dy0.25Co0.75 and 200 K for amorphous TbCo3. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 23 pages, 12 figures

arXiv:2312.10523 [pdf, other]

Paloma: A Benchmark for Evaluating Language Model Fit

Authors: Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Ananya Harsh Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hannaneh Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse Dodge

Abstract: Language models (LMs) commonly report perplexity on monolithic data held out from training. Implicitly or explicitly, this data is composed of domains$\unicode{x2013}$varying distributions of language. Rather than assuming perplexity on one distribution extrapolates to others, Perplexity Analysis for Language Model Assessment (Paloma), measures LM fit to 585 text domains, ranging from nytimes.com… ▽ More Language models (LMs) commonly report perplexity on monolithic data held out from training. Implicitly or explicitly, this data is composed of domains$\unicode{x2013}$varying distributions of language. Rather than assuming perplexity on one distribution extrapolates to others, Perplexity Analysis for Language Model Assessment (Paloma), measures LM fit to 585 text domains, ranging from nytimes.com to r/depression on Reddit. We invite submissions to our benchmark and organize results by comparability based on compliance with guidelines such as removal of benchmark contamination from pretraining. Submissions can also record parameter and training token count to make comparisons of Pareto efficiency for performance as a function of these measures of cost. We populate our benchmark with results from 6 baselines pretrained on popular corpora. In case studies, we demonstrate analyses that are possible with Paloma, such as finding that pretraining without data beyond Common Crawl leads to inconsistent fit to many domains. △ Less

Submitted 16 December, 2023; originally announced December 2023.

Comments: Project Page: https://paloma.allen.ai/

arXiv:2311.18132 [pdf, ps, other]

The Brauer Group of $\mathscr{Y}_0(2)$

Authors: Niven Achenjang, Deewang Bhamidipati, Aashraya Jha, Caleb Ji, Rose Lopez

Abstract: We determine the Brauer group of the Deligne-Mumford stack $\mathscr{Y}_0(2)$, the moduli space of elliptic curves with a marked $2$-torsion subgroup over bases of arithmetic interest. Antieau and Meier determine the Brauer group for $\mathscr{M}_{1,1}$, the moduli stack of elliptic curves by exploiting the fact it is covered by the Legendre family and using the Hochschild-Serre spectral sequence.… ▽ More We determine the Brauer group of the Deligne-Mumford stack $\mathscr{Y}_0(2)$, the moduli space of elliptic curves with a marked $2$-torsion subgroup over bases of arithmetic interest. Antieau and Meier determine the Brauer group for $\mathscr{M}_{1,1}$, the moduli stack of elliptic curves by exploiting the fact it is covered by the Legendre family and using the Hochschild-Serre spectral sequence. Over an algebraically closed field, Shin uses the coarse space map to determine the Brauer group of $\mathscr{M}_{1,1}$. We combine techniques from both papers to determine the Brauer group of $\mathscr{Y}_0(2)$. △ Less

Submitted 4 July, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: Comments welcome, 34 pages

arXiv:2311.16047 [pdf]

doi 10.1117/12.2652661

Observer study-based evaluation of TGAN architecture used to generate oncological PET images

Authors: Roberto Fedrigo, Fereshteh Yousefirizi, Ziping Liu, Abhinav K. Jha, Robert V. Bergen, Jean-Francois Rajotte, Raymond T. Ng, Ingrid Bloise, Sara Harsini, Dan J. Kadrmas, Carlos Uribe, Arman Rahmim

Abstract: The application of computer-vision algorithms in medical imaging has increased rapidly in recent years. However, algorithm training is challenging due to limited sample sizes, lack of labeled samples, as well as privacy concerns regarding data sharing. To address these issues, we previously developed (Bergen et al. 2022) a synthetic PET dataset for Head and Neck (H and N) cancer using the temporal… ▽ More The application of computer-vision algorithms in medical imaging has increased rapidly in recent years. However, algorithm training is challenging due to limited sample sizes, lack of labeled samples, as well as privacy concerns regarding data sharing. To address these issues, we previously developed (Bergen et al. 2022) a synthetic PET dataset for Head and Neck (H and N) cancer using the temporal generative adversarial network (TGAN) architecture and evaluated its performance segmenting lesions and identifying radiomics features in synthesized images. In this work, a two-alternative forced-choice (2AFC) observer study was performed to quantitatively evaluate the ability of human observers to distinguish between real and synthesized oncological PET images. In the study eight trained readers, including two board-certified nuclear medicine physicians, read 170 real/synthetic image pairs presented as 2D-transaxial using a dedicated web app. For each image pair, the observer was asked to identify the real image and input their confidence level with a 5-point Likert scale. P-values were computed using the binomial test and Wilcoxon signed-rank test. A heat map was used to compare the response accuracy distribution for the signed-rank test. Response accuracy for all observers ranged from 36.2% [27.9-44.4] to 63.1% [54.8-71.3]. Six out of eight observers did not identify the real image with statistical significance, indicating that the synthetic dataset was reasonably representative of oncological PET images. Overall, this study adds validity to the realism of our simulated H&N cancer dataset, which may be implemented in the future to train AI algorithms while favoring patient confidentiality and privacy protection. △ Less

Submitted 27 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.15812 [pdf, other]

C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing

Authors: Avigyan Bhattacharya, Mainak Singha, Ankit Jha, Biplab Banerjee

Abstract: We focus on domain and class generalization problems in analyzing optical remote sensing images, using the large-scale pre-trained vision-language model (VLM), CLIP. While contrastively trained VLMs show impressive zero-shot generalization performance, their effectiveness is limited when dealing with diverse domains during training and testing. Existing prompt learning techniques overlook the impo… ▽ More We focus on domain and class generalization problems in analyzing optical remote sensing images, using the large-scale pre-trained vision-language model (VLM), CLIP. While contrastively trained VLMs show impressive zero-shot generalization performance, their effectiveness is limited when dealing with diverse domains during training and testing. Existing prompt learning techniques overlook the importance of incorporating domain and content information into the prompts, which results in a drop in performance while dealing with such multi-domain data. To address these challenges, we propose a solution that ensures domain-invariant prompt learning while enhancing the expressiveness of visual features. We observe that CLIP's vision encoder struggles to identify contextual image information, particularly when image patches are jumbled up. This issue is especially severe in optical remote sensing images, where land-cover classes exhibit well-defined contextual appearances. To this end, we introduce C-SAW, a method that complements CLIP with a self-supervised loss in the visual space and a novel prompt learning technique that emphasizes both visual domain and content-specific features. We keep the CLIP backbone frozen and introduce a small set of projectors for both the CLIP encoders to train C-SAW contrastively. Experimental results demonstrate the superiority of C-SAW across multiple remote sensing benchmarks and different generalization tasks. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: Accepted in ACM ICVGIP 2023

arXiv:2311.13133 [pdf, other]

LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms

Authors: Aditi Jha, Sam Havens, Jeremy Dohmann, Alex Trott, Jacob Portes

Abstract: Large Language Models are traditionally finetuned on large instruction datasets. However recent studies suggest that small, high-quality datasets can suffice for general purpose instruction following. This lack of consensus surrounding finetuning best practices is in part due to rapidly diverging approaches to LLM evaluation. In this study, we ask whether a small amount of diverse finetuning sampl… ▽ More Large Language Models are traditionally finetuned on large instruction datasets. However recent studies suggest that small, high-quality datasets can suffice for general purpose instruction following. This lack of consensus surrounding finetuning best practices is in part due to rapidly diverging approaches to LLM evaluation. In this study, we ask whether a small amount of diverse finetuning samples can improve performance on both traditional perplexity-based NLP benchmarks, and on open-ended, model-based evaluation. We finetune open-source MPT-7B and MPT-30B models on instruction finetuning datasets of various sizes ranging from 1k to 60k samples. We find that subsets of 1k-6k instruction finetuning samples are sufficient to achieve good performance on both (1) traditional NLP benchmarks and (2) model-based evaluation. Finally, we show that mixing textbook-style and open-ended QA finetuning datasets optimizes performance on both evaluation paradigms. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: 36 pages, 12 figures, NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following

arXiv:2311.02599 [pdf, other]

Learning Class and Domain Augmentations for Single-Source Open-Domain Generalization

Authors: Prathmesh Bele, Valay Bundele, Avigyan Bhattacharya, Ankit Jha, Gemma Roig, Biplab Banerjee

Abstract: Single-source open-domain generalization (SS-ODG) addresses the challenge of labeled source domains with supervision during training and unlabeled novel target domains during testing. The target domain includes both known classes from the source domain and samples from previously unseen classes. Existing techniques for SS-ODG primarily focus on calibrating source-domain classifiers to identify ope… ▽ More Single-source open-domain generalization (SS-ODG) addresses the challenge of labeled source domains with supervision during training and unlabeled novel target domains during testing. The target domain includes both known classes from the source domain and samples from previously unseen classes. Existing techniques for SS-ODG primarily focus on calibrating source-domain classifiers to identify open samples in the target domain. However, these methods struggle with visually fine-grained open-closed data, often misclassifying open samples as closed-set classes. Moreover, relying solely on a single source domain restricts the model's ability to generalize. To overcome these limitations, we propose a novel framework called SODG-Net that simultaneously synthesizes novel domains and generates pseudo-open samples using a learning-based objective, in contrast to the ad-hoc mixing strategies commonly found in the literature. Our approach enhances generalization by diversifying the styles of known class samples using a novel metric criterion and generates diverse pseudo-open samples to train a unified and confident multi-class classifier capable of handling both open and closed-set data. Extensive experimental evaluations conducted on multiple benchmarks consistently demonstrate the superior performance of SODG-Net compared to the literature. △ Less

Submitted 5 November, 2023; originally announced November 2023.

Comments: 11 pages, WACV 2024

arXiv:2311.01691 [pdf, ps, other]

Finding Integral Points of Elliptic Curves over Imaginary Quadratic Fields

Authors: Aashraya Jha

Abstract: We determine the set of integral points on elliptic curves of rank $2$ defined over imaginary quadratic fields using quadratic Chabauty. This builds on the work of Balakrishnan, Bianchi, Besser and Müller, and gives the first example of quadratic Chabauty used to determine the integral points on a curve which is not a base change from $\mathbb{Q}$. We determine the set of integral points on elliptic curves of rank $2$ defined over imaginary quadratic fields using quadratic Chabauty. This builds on the work of Balakrishnan, Bianchi, Besser and Müller, and gives the first example of quadratic Chabauty used to determine the integral points on a curve which is not a base change from $\mathbb{Q}$. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 22 pages, comments and feedback welcome

arXiv:2310.14604 [pdf, other]

doi 10.13140/RG.2.2.31895.96168

Beyond VaR and CVaR: Topological Risk Measures in Financial Markets

Authors: Amit Kumar Jha

Abstract: This paper introduces a novel approach to financial risk assessment by incorporating topological data analysis (TDA), specifically cohomology groups, into the evaluation of equities portfolios. The study aims to go beyond traditional risk measures like Value at Risk (VaR) and Conditional Value at Risk (CVaR), offering a more nuanced understanding of market complexities. Using last one year daily r… ▽ More This paper introduces a novel approach to financial risk assessment by incorporating topological data analysis (TDA), specifically cohomology groups, into the evaluation of equities portfolios. The study aims to go beyond traditional risk measures like Value at Risk (VaR) and Conditional Value at Risk (CVaR), offering a more nuanced understanding of market complexities. Using last one year daily real-world closing price return data for three equities Apple, Microsoft and Google , we developed a new topological riskmeasure, termed Topological VaR Distance (TVaRD). Preliminary results indicate a significant change in the density of the point cloud representing the financial time series during stress conditions, suggesting that TVaRD may offer additional insights into portfolio risk and has the potential to complement existing risk management tools. △ Less

Submitted 27 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: 14 pages, 7 figures, 5 tables

MSC Class: 55N99

arXiv:2309.13470 [pdf, other]

HAVE-Net: Hallucinated Audio-Visual Embeddings for Few-Shot Classification with Unimodal Cues

Authors: Ankit Jha, Debabrata Pal, Mainak Singha, Naman Agarwal, Biplab Banerjee

Abstract: Recognition of remote sensing (RS) or aerial images is currently of great interest, and advancements in deep learning algorithms added flavor to it in recent years. Occlusion, intra-class variance, lighting, etc., might arise while training neural networks using unimodal RS visual input. Even though joint training of audio-visual modalities improves classification performance in a low-data regime,… ▽ More Recognition of remote sensing (RS) or aerial images is currently of great interest, and advancements in deep learning algorithms added flavor to it in recent years. Occlusion, intra-class variance, lighting, etc., might arise while training neural networks using unimodal RS visual input. Even though joint training of audio-visual modalities improves classification performance in a low-data regime, it has yet to be thoroughly investigated in the RS domain. Here, we aim to solve a novel problem where both the audio and visual modalities are present during the meta-training of a few-shot learning (FSL) classifier; however, one of the modalities might be missing during the meta-testing stage. This problem formulation is pertinent in the RS domain, given the difficulties in data acquisition or sensor malfunctioning. To mitigate, we propose a novel few-shot generative framework, Hallucinated Audio-Visual Embeddings-Network (HAVE-Net), to meta-train cross-modal features from limited unimodal data. Precisely, these hallucinated features are meta-learned from base classes and used for few-shot classification on novel classes during the inference phase. The experimental results on the benchmark ADVANCE and AudioSetZSL datasets show that our hallucinated modality augmentation strategy for few-shot classification outperforms the classifier performance trained with the real multimodal information at least by 0.8-2%. △ Less

Submitted 23 September, 2023; originally announced September 2023.

Comments: 8 Page, 2 Figures, 2 Tables, Accepted in Adapting to Change: Reliable Multimodal Learning Across Domains Workshop, ECML PKDD 2023

Showing 1–50 of 241 results for author: Jha, A