Search | arXiv e-print repository

Visual Analysis of Prediction Uncertainty in Neural Networks for Deep Image Synthesis

Authors: Soumya Dutta, Faheem Nizar, Ahmad Amaan, Ayan Acharya

Abstract: Ubiquitous applications of Deep neural networks (DNNs) in different artificial intelligence systems have led to their adoption in solving challenging visualization problems in recent years. While sophisticated DNNs offer an impressive generalization, it is imperative to comprehend the quality, confidence, robustness, and uncertainty associated with their prediction. A thorough understanding of the… ▽ More Ubiquitous applications of Deep neural networks (DNNs) in different artificial intelligence systems have led to their adoption in solving challenging visualization problems in recent years. While sophisticated DNNs offer an impressive generalization, it is imperative to comprehend the quality, confidence, robustness, and uncertainty associated with their prediction. A thorough understanding of these quantities produces actionable insights that help application scientists make informed decisions. Unfortunately, the intrinsic design principles of the DNNs cannot beget prediction uncertainty, necessitating separate formulations for robust uncertainty-aware models for diverse visualization applications. To that end, this contribution demonstrates how the prediction uncertainty and sensitivity of DNNs can be estimated efficiently using various methods and then interactively compared and contrasted for deep image synthesis tasks. Our inspection suggests that uncertainty-aware deep visualization models generate illustrations of informative and superior quality and diversity. Furthermore, prediction uncertainty improves the robustness and interpretability of deep visualization models, making them practical and convenient for various scientific domains that thrive on visual analyses. △ Less

Submitted 22 May, 2024; originally announced June 2024.

arXiv:2406.17188 [pdf, other]

Geometric Median (GM) Matching for Robust Data Pruning

Authors: Anish Acharya, Inderjit S Dhillon, Sujay Sanghavi

Abstract: Data pruning, the combinatorial task of selecting a small and informative subset from a large dataset, is crucial for mitigating the enormous computational costs associated with training data-hungry modern deep learning models at scale. Since large-scale data collections are invariably noisy, developing data pruning strategies that remain robust even in the presence of corruption is critical in pr… ▽ More Data pruning, the combinatorial task of selecting a small and informative subset from a large dataset, is crucial for mitigating the enormous computational costs associated with training data-hungry modern deep learning models at scale. Since large-scale data collections are invariably noisy, developing data pruning strategies that remain robust even in the presence of corruption is critical in practice. Unfortunately, the existing heuristics for (robust) data pruning lack theoretical coherence and rely on heroic assumptions, that are, often unattainable, by the very nature of the problem setting. Moreover, these strategies often yield sub-optimal neural scaling laws even compared to random sampling, especially in scenarios involving strong corruption and aggressive pruning rates -- making provably robust data pruning an open challenge. In response, in this work, we propose Geometric Median ($\gm$) Matching -- a herding~\citep{welling2009herding} style greedy algorithm -- that yields a $k$-subset such that the mean of the subset approximates the geometric median of the (potentially) noisy dataset. Theoretically, we show that $\gm$ Matching enjoys an improved $\gO(1/k)$ scaling over $\gO(1/\sqrt{k})$ scaling of uniform sampling; while achieving the optimal breakdown point of 1/2 even under arbitrary corruption. Extensive experiments across popular deep learning benchmarks indicate that $\gm$ Matching consistently outperforms prior state-of-the-art; the gains become more profound at high rates of corruption and aggressive pruning rates; making $\gm$ Matching a strong baseline for future research in robust data pruning. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2405.20513 [pdf, other]

Deep Modeling of Non-Gaussian Aleatoric Uncertainty

Authors: Aastha Acharya, Caleb Lee, Marissa D'Alonzo, Jared Shamwell, Nisar R. Ahmed, Rebecca Russell

Abstract: Deep learning offers promising new ways to accurately model aleatoric uncertainty in robotic estimation systems, particularly when the uncertainty distributions do not conform to traditional assumptions of being fixed and Gaussian. In this study, we formulate and evaluate three fundamental deep learning approaches for conditional probability density modeling to quantify non-Gaussian aleatoric unce… ▽ More Deep learning offers promising new ways to accurately model aleatoric uncertainty in robotic estimation systems, particularly when the uncertainty distributions do not conform to traditional assumptions of being fixed and Gaussian. In this study, we formulate and evaluate three fundamental deep learning approaches for conditional probability density modeling to quantify non-Gaussian aleatoric uncertainty: parametric, discretized, and generative modeling. We systematically compare the respective strengths and weaknesses of these three methods on simulated non-Gaussian densities as well as on real-world terrain-relative navigation data. Our results show that these deep learning methods can accurately capture complex uncertainty patterns, highlighting their potential for improving the reliability and robustness of estimation systems. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 8 pages, 7 figures

arXiv:2404.11686 [pdf, other]

Probing the intergalactic medium during the Epoch of Reionization using 21-cm signal power spectra

Authors: Raghunath Ghara, Abinash Kumar Shaw, Saleem Zaroubi, Benedetta Ciardi, Garrelt Mellema, Léon V. E. Koopmans, Anshuman Acharya, Madhurima Choudhury, Sambit K. Giri, Ilian T. Iliev, Qing-Bo Ma, Florent Mertens

Abstract: The redshifted 21-cm signal from the epoch of reionization (EoR) directly probes the ionization and thermal states of the intergalactic medium during that period. In particular, the distribution of the ionized regions around the radiating sources during EoR introduces scale-dependent features in the spherically-averaged EoR 21-cm signal power spectrum. The goal is to study these scale-dependent fe… ▽ More The redshifted 21-cm signal from the epoch of reionization (EoR) directly probes the ionization and thermal states of the intergalactic medium during that period. In particular, the distribution of the ionized regions around the radiating sources during EoR introduces scale-dependent features in the spherically-averaged EoR 21-cm signal power spectrum. The goal is to study these scale-dependent features at different stages of reionization using numerical simulations and build a source model-independent framework to probe the properties of the intergalactic medium using EoR 21-cm signal power spectrum measurements. Under the assumption of high spin temperature, we modelled the redshift evolution of the ratio of EoR 21-cm brightness temperature power spectrum and the corresponding density power spectrum using an ansatz consisting of a set of redshift and scale-independent parameters. This set of eight parameters probes the redshift evolution of the average ionization fraction and the quantities related to the morphology of the ionized regions. We have tested this ansatz on different reionization scenarios generated using different simulation algorithms and found that it is able to recover the redshift evolution of the average neutral fraction within an absolute deviation $\lesssim 0.1$. Our framework allows us to interpret 21-cm signal power spectra in terms of parameters related to the state of the IGM. This source model-independent framework can efficiently constrain reionization scenarios using multi-redshift power spectrum measurements with ongoing and future radio telescopes such as LOFAR, MWA, HERA, and SKA. This will add independent information regarding the EoR IGM properties. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 16 pages, 13 figures, 2 tables, Accepted for publication in Astronomy and Astrophysics

Report number: NORDITA 2024-009

arXiv:2404.07232 [pdf, ps, other]

Ideal Magnetohydrodynamics and Field Dislocation Mechanics

Authors: Amit Acharya

Abstract: The fully nonlinear (geometric and material) system of Field Dislocation Mechanics is reviewed to establish an exact analogy with the equations of ideal magnetohydrodynamics (ideal MHD) under suitable physically simplifying circumstances. Weak solutions with various conservation properties have been established for ideal MHD recently by Faraco, Lindberg, and Szekelyhidi using the techniques of com… ▽ More The fully nonlinear (geometric and material) system of Field Dislocation Mechanics is reviewed to establish an exact analogy with the equations of ideal magnetohydrodynamics (ideal MHD) under suitable physically simplifying circumstances. Weak solutions with various conservation properties have been established for ideal MHD recently by Faraco, Lindberg, and Szekelyhidi using the techniques of compensated compactness of Tartar and Murat and convex integration; by the established analogy, these results would seem to be transferable to the idealization of Field Dislocation Mechanics considered. A dual variational principle is designed and discussed for this system of PDE, with the technique transferable to the study of MHD as well. △ Less

Submitted 18 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

arXiv:2404.07214 [pdf, other]

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions

Authors: Akash Ghosh, Arkadeep Acharya, Sriparna Saha, Vinija Jain, Aman Chadha

Abstract: The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily adept at processing textual information. To address this constraint, researchers have endeavored to integrate visual capabilities with LLMs, resulting in the emergence of Vision-Language Models (VLMs). These advanced… ▽ More The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily adept at processing textual information. To address this constraint, researchers have endeavored to integrate visual capabilities with LLMs, resulting in the emergence of Vision-Language Models (VLMs). These advanced models are instrumental in tackling more intricate tasks such as image captioning and visual question answering. In our comprehensive survey paper, we delve into the key advancements within the realm of VLMs. Our classification organizes VLMs into three distinct categories: models dedicated to vision-language understanding, models that process multimodal inputs to generate unimodal (textual) outputs and models that both accept and produce multimodal inputs and outputs.This classification is based on their respective capabilities and functionalities in processing and generating various modalities of data.We meticulously dissect each model, offering an extensive analysis of its foundational architecture, training data sources, as well as its strengths and limitations wherever possible, providing readers with a comprehensive understanding of its essential components. We also analyzed the performance of VLMs in various benchmark datasets. By doing so, we aim to offer a nuanced understanding of the diverse landscape of VLMs. Additionally, we underscore potential avenues for future research in this dynamic domain, anticipating further breakthroughs and advancements. △ Less

Submitted 12 April, 2024; v1 submitted 20 February, 2024; originally announced April 2024.

Comments: The most extensive and up to date Survey on Visual Language Models covering 76 Visual Language Models

arXiv:2403.06061 [pdf, other]

Coupled Dislocations and Fracture dynamics at finite deformation: model derivation, and physical questions

Authors: Amit Acharya

Abstract: A continuum mechanical model of coupled dislocation based plasticity and fracture at finite deformation is proposed. Motivating questions and target applications of the model are sketched. A continuum mechanical model of coupled dislocation based plasticity and fracture at finite deformation is proposed. Motivating questions and target applications of the model are sketched. △ Less

Submitted 3 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.00779 [pdf, other]

Mid-surface scaling invariance of some bending strain measures

Authors: Amit Acharya

Abstract: The mid-surface scaling invariance of bending strain measures proposed in [Acharya (2000)] is discussed in light of the work of [arXiv:2010.14308]. The mid-surface scaling invariance of bending strain measures proposed in [Acharya (2000)] is discussed in light of the work of [arXiv:2010.14308]. △ Less

Submitted 3 April, 2024; v1 submitted 15 February, 2024; originally announced March 2024.

Comments: Reply to arXiv:2010.14308. This preprint has been classified as math.AP by arXiv moderators

arXiv:2402.06038 [pdf, other]

Contrastive Approach to Prior Free Positive Unlabeled Learning

Authors: Anish Acharya, Sujay Sanghavi

Abstract: Positive Unlabeled (PU) learning refers to the task of learning a binary classifier given a few labeled positive samples, and a set of unlabeled samples (which could be positive or negative). In this paper, we propose a novel PU learning framework, that starts by learning a feature space through pretext-invariant representation learning and then applies pseudo-labeling to the unlabeled examples, l… ▽ More Positive Unlabeled (PU) learning refers to the task of learning a binary classifier given a few labeled positive samples, and a set of unlabeled samples (which could be positive or negative). In this paper, we propose a novel PU learning framework, that starts by learning a feature space through pretext-invariant representation learning and then applies pseudo-labeling to the unlabeled examples, leveraging the concentration property of the embeddings. Overall, our proposed approach handily outperforms state-of-the-art PU learning methods across several standard PU benchmark datasets, while not requiring a-priori knowledge or estimate of class prior. Remarkably, our method remains effective even when labeled data is scant, where most PU learning algorithms falter. We also provide simple theoretical analysis motivating our proposed algorithms and establish generalization guarantee for our approach. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.02811 [pdf, other]

Multi-scale fMRI time series analysis for understanding neurodegeneration in MCI

Authors: Ammu R., Debanjali Bhattacharya, Ameiy Acharya, Ninad Aithal, Neelam Sinha

Abstract: In this study, we present a technique that spans multi-scale views (global scale -- meaning brain network-level and local scale -- examining each individual ROI that constitutes the network) applied to resting-state fMRI volumes. Deep learning based classification is utilized in understanding neurodegeneration. The novelty of the proposed approach lies in utilizing two extreme scales of analysis.… ▽ More In this study, we present a technique that spans multi-scale views (global scale -- meaning brain network-level and local scale -- examining each individual ROI that constitutes the network) applied to resting-state fMRI volumes. Deep learning based classification is utilized in understanding neurodegeneration. The novelty of the proposed approach lies in utilizing two extreme scales of analysis. One branch considers the entire network within graph-analysis framework. Concurrently, the second branch scrutinizes each ROI within a network independently, focusing on evolution of dynamics. For each subject, graph-based approach employs partial correlation to profile the subject in a single graph where each ROI is a node, providing insights into differences in levels of participation. In contrast, non-linear analysis employs recurrence plots to profile a subject as a multichannel 2D image, revealing distinctions in underlying dynamics. The proposed approach is employed for classification of a cohort of 50 healthy control (HC) and 50 Mild Cognitive Impairment (MCI), sourced from ADNI dataset. Results point to: (1) reduced activity in ROIs such as PCC in MCI (2) greater activity in occipital in MCI, which is not seen in HC (3) when analysed for dynamics, all ROIs in MCI show greater predictability in time-series. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 12 pages, 3 figures and 4 tables

arXiv:2401.12332 [pdf, other]

A Precise Characterization of SGD Stability Using Loss Surface Geometry

Authors: Gregory Dexter, Borja Ocejo, Sathiya Keerthi, Aman Gupta, Ayan Acharya, Rajiv Khanna

Abstract: Stochastic Gradient Descent (SGD) stands as a cornerstone optimization algorithm with proven real-world empirical successes but relatively limited theoretical understanding. Recent research has illuminated a key factor contributing to its practical efficacy: the implicit regularization it instigates. Several studies have investigated the linear stability property of SGD in the vicinity of a statio… ▽ More Stochastic Gradient Descent (SGD) stands as a cornerstone optimization algorithm with proven real-world empirical successes but relatively limited theoretical understanding. Recent research has illuminated a key factor contributing to its practical efficacy: the implicit regularization it instigates. Several studies have investigated the linear stability property of SGD in the vicinity of a stationary point as a predictive proxy for sharpness and generalization error in overparameterized neural networks (Wu et al., 2022; Jastrzebski et al., 2019; Cohen et al., 2021). In this paper, we delve deeper into the relationship between linear stability and sharpness. More specifically, we meticulously delineate the necessary and sufficient conditions for linear stability, contingent on hyperparameters of SGD and the sharpness at the optimum. Towards this end, we introduce a novel coherence measure of the loss Hessian that encapsulates pertinent geometric properties of the loss function that are relevant to the linear stability of SGD. It enables us to provide a simplified sufficient condition for identifying linear instability at an optimum. Notably, compared to previous works, our analysis relies on significantly milder assumptions and is applicable for a broader class of loss functions than known before, encompassing not only mean-squared error but also cross-entropy loss. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: To appear at ICLR 2024

arXiv:2401.08814 [pdf, other]

Inviscid Burgers as a degenerate elliptic problem

Authors: Uditnarayan Kouskiya, Amit Acharya

Abstract: We demonstrate the feasibility of a scheme to obtain approximate weak solutions to the (inviscid) Burgers equation in conservation and Hamilton-Jacobi form, treated as degenerate elliptic problems. We show different variants recover non-unique weak solutions as appropriate, and also specific constructive approaches to recover the corresponding entropy solutions. We demonstrate the feasibility of a scheme to obtain approximate weak solutions to the (inviscid) Burgers equation in conservation and Hamilton-Jacobi form, treated as degenerate elliptic problems. We show different variants recover non-unique weak solutions as appropriate, and also specific constructive approaches to recover the corresponding entropy solutions. △ Less

Submitted 12 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.08538 [pdf, other]

A Hidden Convexity of Nonlinear Elasticity

Authors: Siddharth Singh, Janusz Ginster, Amit Acharya

Abstract: A technique for developing convex dual variational principles for the governing PDE of nonlinear elastostatics and elastodynamics is presented. This allows the definition of notions of a variational dual solution and a dual solution corresponding to the PDEs of nonlinear elasticity, even when the latter arise as formal Euler-Lagrange equations corresponding to non-quasiconvex elastic energy functi… ▽ More A technique for developing convex dual variational principles for the governing PDE of nonlinear elastostatics and elastodynamics is presented. This allows the definition of notions of a variational dual solution and a dual solution corresponding to the PDEs of nonlinear elasticity, even when the latter arise as formal Euler-Lagrange equations corresponding to non-quasiconvex elastic energy functionals whose energy minimizers do not exist. This is demonstrated rigorously in the case of elastostatics for the Saint-Venant Kirchhoff material (in all dimensions), where the existence of variational dual solutions is also proven. The existence of a variational dual solution for the incompressible neo-Hookean material in 2-d is also shown. Stressed and unstressed elastostatic and elastodynamic solutions in 1 space dimension corresponding to a non-convex, double-well energy are computed using the dual methodology. In particular, we show the stability of a dual elastodynamic equilibrium solution for which there are regions of non-vanishing length with negative elastic stiffness, i.e.~non-hyperbolic regions, for which the corresponding primal problem is ill-posed and demonstrates an explosive `Hadamard instability;' this appears to have implications for the modeling of physically observed softening behavior in macroscopic mechanical response. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.06372 [pdf, other]

doi 10.3847/2515-5172/ad18b5

Spectral fit residuals as an indicator to increase model complexity

Authors: Anshuman Acharya, Vinay L. Kashyap

Abstract: Spectral fitting of X-ray data usually involves minimizing statistics like the chi-square and the Cash statistic. Here we discuss their limitations and introduce two measures based on the cumulative sum (CuSum) of model residuals to evaluate whether model complexity could be increased: the percentage of bins exceeding a nominal threshold in a CuSum array (pct$_{CuSum}$), and the excess area under… ▽ More Spectral fitting of X-ray data usually involves minimizing statistics like the chi-square and the Cash statistic. Here we discuss their limitations and introduce two measures based on the cumulative sum (CuSum) of model residuals to evaluate whether model complexity could be increased: the percentage of bins exceeding a nominal threshold in a CuSum array (pct$_{CuSum}$), and the excess area under the CuSum compared to the nominal (p$_\textit{area}$). We demonstrate their use with an application to a $\textit{Chandra}$ ACIS spectral fit. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 3 pages, 1 figure, published in the Research Notes of the American Astronomical Society (RNAAS)

Journal ref: Res. Notes AAS 8 1 (2024)

arXiv:2401.01596 [pdf, other]

MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries

Authors: Akash Ghosh, Arkadeep Acharya, Prince Jha, Aniket Gaudgaul, Rajdeep Majumdar, Sriparna Saha, Aman Chadha, Raghav Jain, Setu Sinha, Shivani Agarwal

Abstract: In the healthcare domain, summarizing medical questions posed by patients is critical for improving doctor-patient interactions and medical decision-making. Although medical data has grown in complexity and quantity, the current body of research in this domain has primarily concentrated on text-based methods, overlooking the integration of visual cues. Also prior works in the area of medical quest… ▽ More In the healthcare domain, summarizing medical questions posed by patients is critical for improving doctor-patient interactions and medical decision-making. Although medical data has grown in complexity and quantity, the current body of research in this domain has primarily concentrated on text-based methods, overlooking the integration of visual cues. Also prior works in the area of medical question summarisation have been limited to the English language. This work introduces the task of multimodal medical question summarization for codemixed input in a low-resource setting. To address this gap, we introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset, which combines Hindi-English codemixed medical queries with visual aids. This integration enriches the representation of a patient's medical condition, providing a more comprehensive perspective. We also propose a framework named MedSumm that leverages the power of LLMs and VLMs for this task. By utilizing our MMCQS dataset, we demonstrate the value of integrating visual information from images to improve the creation of medically detailed summaries. This multimodal strategy not only improves healthcare decision-making but also promotes a deeper comprehension of patient queries, paving the way for future exploration in personalized and responsive medical care. Our dataset, code, and pre-trained models will be made publicly available. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: ECIR 2024

arXiv:2312.14288 [pdf]

The Status and Prospects of Phytoremediation of Heavy Metals

Authors: Aniruddha Acharya, Enrique Perez, Miller Maddox-Mandolini, Hania De La Fuente

Abstract: The release of heavy metals into the agricultural soil and waterbodies has been accelerated due to anthropogenic activities. They are not usually required for biological functions thus, their accumulation in biological system poses serious threat to health and environment globally. Phytoremediation offers a safe, inexpensive, and ecologically sustainable technique to clean habitats contaminated wi… ▽ More The release of heavy metals into the agricultural soil and waterbodies has been accelerated due to anthropogenic activities. They are not usually required for biological functions thus, their accumulation in biological system poses serious threat to health and environment globally. Phytoremediation offers a safe, inexpensive, and ecologically sustainable technique to clean habitats contaminated with heavy metals. Though several plants have been identified and used as a potential candidate for such phytoremediation, the technique is still at its formative stage and has been mostly confined to laboratory and greenhouses. However, recently several field studies have shown promising results that can propel large-scale implementation of this technology in industrial sites and urban agriculture. Realistically, the commercialization of this technique is possible if interdisciplinary approach is employed to increase its efficiency. This review presents a comprehensive narration of the status and future of the technique. It illustrates the concept of phytoremediation, the ecological and commercial benefits, and the types of phytoremediation. The candidate plants and factors that influences phytoremediation has been discussed. Finally, the physiological and molecular mechanism along with the future of the technique has been described. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 34 pages, 3 figures, 2 tables, review paper

arXiv:2312.11541 [pdf, other]

CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare

Authors: Akash Ghosh, Arkadeep Acharya, Raghav Jain, Sriparna Saha, Aman Chadha, Setu Sinha

Abstract: In the era of modern healthcare, swiftly generating medical question summaries is crucial for informed and timely patient care. Despite the increasing complexity and volume of medical data, existing studies have focused solely on text-based summarization, neglecting the integration of visual information. Recognizing the untapped potential of combining textual queries with visual representations of… ▽ More In the era of modern healthcare, swiftly generating medical question summaries is crucial for informed and timely patient care. Despite the increasing complexity and volume of medical data, existing studies have focused solely on text-based summarization, neglecting the integration of visual information. Recognizing the untapped potential of combining textual queries with visual representations of medical conditions, we introduce the Multimodal Medical Question Summarization (MMQS) Dataset. This dataset, a major contribution to our work, pairs medical queries with visual aids, facilitating a richer and more nuanced understanding of patient needs. We also propose a framework, utilizing the power of Contrastive Language Image Pretraining(CLIP) and Large Language Models(LLMs), consisting of four modules that identify medical disorders, generate relevant context, filter medical concepts, and craft visually aware summaries. Our comprehensive framework harnesses the power of CLIP, a multimodal foundation model, and various general-purpose LLMs, comprising four main modules: the medical disorder identification module, the relevant context generation module, the context filtration module for distilling relevant medical concepts and knowledge, and finally, a general-purpose LLM to generate visually aware medical question summaries. Leveraging our MMQS dataset, we showcase how visual cues from images enhance the generation of medically nuanced summaries. This multimodal approach not only enhances the decision-making process in healthcare but also fosters a more nuanced understanding of patient queries, laying the groundwork for future research in personalized and responsive medical care △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: AAAI 2024

arXiv:2312.09378 [pdf, other]

Emergent fault friction and supershear in a continuum model of geophysical rupture

Authors: Abhishek Arora, Amit Acharya

Abstract: Important physical observations in rupture dynamics such as static fault friction, short-slip, self-healing, and supershear phenomenon in cracks are studied. A continuum model of rupture dynamics is developed using the field dislocation mechanics (FDM) theory. The energy density function in our model encodes accepted and simple physical facts related to rocks and granular materials under compressi… ▽ More Important physical observations in rupture dynamics such as static fault friction, short-slip, self-healing, and supershear phenomenon in cracks are studied. A continuum model of rupture dynamics is developed using the field dislocation mechanics (FDM) theory. The energy density function in our model encodes accepted and simple physical facts related to rocks and granular materials under compression. We work within a 2-dimensional ansatz of FDM where the rupture front is allowed to move only in a horizontal fault layer sandwiched between elastic blocks. Damage via the degradation of elastic modulus is allowed to occur only in the fault layer, characterized by the amount of plastic slip. The theory dictates the evolution equation of the plastic shear strain to be a Hamilton-Jacobi (H-J) equation, resulting in the representation of a propagating rupture front, which is fully coupled to elastodynamics in the whole domain. Our simulations recover static friction laws as emergent features of our model, without putting in by hand any such discontinuous criteria. Estimates of material parameters of cohesion and friction angle are deduced. Short-slip and slip-weakening (crack-like) behaviors are also reproduced as a function of the degree of damage behind the rupture front. The long-time behavior of a moving rupture front is probed, and it is deduced that the equilibrium profiles under no shear stress are not traveling wave profiles under non-zero shear load. However, it is shown that a traveling wave structure is likely attained in the limit of long times. Finally, a crack-like damage front is driven by an initial impact loading, and it is observed in our simulations that an upper bound to the crack speed is the dilatational wave speed of the material unless the material is put under pre-stressed conditions, in which case supersonic motion can be obtained. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: keywords: Dynamic Rupture; fault friction; supershear; yield criterion

arXiv:2312.03742 [pdf, other]

Clinical Risk Prediction Using Language Models: Benefits And Considerations

Authors: Angeela Acharya, Sulabh Shrestha, Anyi Chen, Joseph Conte, Sanja Avramovic, Siddhartha Sikdar, Antonios Anastasopoulos, Sanmay Das

Abstract: The utilization of Electronic Health Records (EHRs) for clinical risk prediction is on the rise. However, strict privacy regulations limit access to comprehensive health records, making it challenging to apply standard machine learning algorithms in practical real-world scenarios. Previous research has addressed this data limitation by incorporating medical ontologies and employing transfer learni… ▽ More The utilization of Electronic Health Records (EHRs) for clinical risk prediction is on the rise. However, strict privacy regulations limit access to comprehensive health records, making it challenging to apply standard machine learning algorithms in practical real-world scenarios. Previous research has addressed this data limitation by incorporating medical ontologies and employing transfer learning methods. In this study, we investigate the potential of leveraging language models (LMs) as a means to incorporate supplementary domain knowledge for improving the performance of various EHR-based risk prediction tasks. Unlike applying LMs to unstructured EHR data such as clinical notes, this study focuses on using textual descriptions within structured EHR to make predictions exclusively based on that information. We extensively compare against previous approaches across various data types and sizes. We find that employing LMs to represent structured EHRs, such as diagnostic histories, leads to improved or at least comparable performance in diverse risk prediction tasks. Furthermore, LM-based approaches offer numerous advantages, including few-shot learning, the capability to handle previously unseen medical concepts, and adaptability to various medical vocabularies. Nevertheless, we underscore, through various experiments, the importance of being cautious when employing such models, as concerns regarding the reliability of LMs persist. △ Less

Submitted 28 November, 2023; originally announced December 2023.

Comments: 12 pages, 6 figures, 4 tables

arXiv:2312.01768 [pdf, other]

Localizing and Assessing Node Significance in Default Mode Network using Sub-Community Detection in Mild Cognitive Impairment

Authors: Ameiy Acharya, Chakka Sai Pradeep, Neelam Sinha

Abstract: Our study aims to utilize fMRI to identify the affected brain regions within the Default Mode Network (DMN) in subjects with Mild Cognitive Impairment (MCI), using a novel Node Significance Score (NSS). We construct subject-specific DMN graphs by employing partial correlation of Regions of Interest (ROIs) that make-up the DMN. For the DMN graph, ROIs are the nodes and edges are determined based on… ▽ More Our study aims to utilize fMRI to identify the affected brain regions within the Default Mode Network (DMN) in subjects with Mild Cognitive Impairment (MCI), using a novel Node Significance Score (NSS). We construct subject-specific DMN graphs by employing partial correlation of Regions of Interest (ROIs) that make-up the DMN. For the DMN graph, ROIs are the nodes and edges are determined based on partial correlation. Four popular community detection algorithms (Clique Percolation Method (CPM), Louvain algorithm, Greedy Modularity and Leading Eigenvectors) are applied to determine the largest sub-community. NSS ratings are derived for each node, considering (I) frequency in the largest sub-community within a class across all subjects and (II) occurrence in the largest sub-community according to all four methods. After computing the NSS of each ROI in both healthy and MCI subjects, we quantify the score disparity to identify nodes most impacted by MCI. The results reveal a disparity exceeding 20% for 10 DMN nodes, maximally for PCC and Fusiform, showing 45.69% and 43.08% disparity. This aligns with existing medical literature, additionally providing a quantitative measure that enables the ordering of the affected ROIs. These findings offer valuable insights and could lead to treatment strategies aggressively targeting the affected nodes. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: 4 pages, 2 figures

arXiv:2311.16633 [pdf, other]

21-cm Signal from the Epoch of Reionization: A Machine Learning upgrade to Foreground Removal with Gaussian Process Regression

Authors: Anshuman Acharya, Florent Mertens, Benedetta Ciardi, Raghunath Ghara, Léon V. E. Koopmans, Sambit K. Giri, Ian Hothi, Qing-Bo Ma, Garrelt Mellema, Satyapan Munshi

Abstract: In recent years, a Gaussian Process Regression (GPR) based framework has been developed for foreground mitigation from data collected by the LOw-Frequency ARray (LOFAR), to measure the 21-cm signal power spectrum from the Epoch of Reionization (EoR) and Cosmic Dawn. However, it has been noted that through this method there can be a significant amount of signal loss if the EoR signal covariance is… ▽ More In recent years, a Gaussian Process Regression (GPR) based framework has been developed for foreground mitigation from data collected by the LOw-Frequency ARray (LOFAR), to measure the 21-cm signal power spectrum from the Epoch of Reionization (EoR) and Cosmic Dawn. However, it has been noted that through this method there can be a significant amount of signal loss if the EoR signal covariance is misestimated. To obtain better covariance models, we propose to use a kernel trained on the {\tt GRIZZLY} simulations using a Variational Auto-Encoder (VAE) based algorithm. In this work, we explore the abilities of this Machine Learning based kernel (VAE kernel) used with GPR, by testing it on mock signals from a variety of simulations, exploring noise levels corresponding to $\approx$10 nights ($\approx$141 hours) and $\approx$100 nights ($\approx$1410 hours) of observations with LOFAR. Our work suggests the possibility of successful extraction of the 21-cm signal within 2$σ$ uncertainty in most cases using the VAE kernel, with better recovery of both shape and power than with previously used covariance models. We also explore the role of the excess noise component identified in past applications of GPR and additionally analyse the possibility of redshift dependence on the performance of the VAE kernel. The latter allows us to prepare for future LOFAR observations at a range of redshifts, as well as compare with results from other telescopes. △ Less

Submitted 29 November, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: 13 pages, 7 figures, 3 tables. Accepted for publication in the Monthly Notices of the Royal Astronomical Society

Report number: NORDITA 2023-074

arXiv:2311.12289 [pdf, other]

ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science

Authors: Sai Munikoti, Anurag Acharya, Sridevi Wagle, Sameera Horawalavithana

Abstract: Large language models record impressive performance on many natural language processing tasks. However, their knowledge capacity is limited to the pretraining corpus. Retrieval augmentation offers an effective solution by retrieving context from external knowledge sources to complement the language model. However, existing retrieval augmentation techniques ignore the structural relationships betwe… ▽ More Large language models record impressive performance on many natural language processing tasks. However, their knowledge capacity is limited to the pretraining corpus. Retrieval augmentation offers an effective solution by retrieving context from external knowledge sources to complement the language model. However, existing retrieval augmentation techniques ignore the structural relationships between these documents. Furthermore, retrieval models are not explored much in scientific tasks, especially in regard to the faithfulness of retrieved documents. In this paper, we propose a novel structure-aware retrieval augmented language model that accommodates document structure during retrieval augmentation. We create a heterogeneous document graph capturing multiple types of relationships (e.g., citation, co-authorship, etc.) that connect documents from more than 15 scientific disciplines (e.g., Physics, Medicine, Chemistry, etc.). We train a graph neural network on the curated document graph to act as a structural encoder for the corresponding passages retrieved during the model pretraining. Particularly, along with text embeddings of the retrieved passages, we obtain structural embeddings of the documents (passages) and fuse them together before feeding them to the language model. We evaluate our model extensively on various scientific benchmarks that include science question-answering and scientific document classification tasks. Experimental results demonstrate that structure-aware retrieval improves retrieving more coherent, faithful and contextually relevant passages, while showing a comparable performance in the overall accuracy. △ Less

Submitted 20 November, 2023; originally announced November 2023.

ACM Class: I.2.7

arXiv:2311.09358 [pdf, other]

Empirical evaluation of Uncertainty Quantification in Retrieval-Augmented Language Models for Science

Authors: Sridevi Wagle, Sai Munikoti, Anurag Acharya, Sara Smith, Sameera Horawalavithana

Abstract: Large language models (LLMs) have shown remarkable achievements in natural language processing tasks, producing high-quality outputs. However, LLMs still exhibit limitations, including the generation of factually incorrect information. In safety-critical applications, it is important to assess the confidence of LLM-generated content to make informed decisions. Retrieval Augmented Language Models (… ▽ More Large language models (LLMs) have shown remarkable achievements in natural language processing tasks, producing high-quality outputs. However, LLMs still exhibit limitations, including the generation of factually incorrect information. In safety-critical applications, it is important to assess the confidence of LLM-generated content to make informed decisions. Retrieval Augmented Language Models (RALMs) is relatively a new area of research in NLP. RALMs offer potential benefits for scientific NLP tasks, as retrieved documents, can serve as evidence to support model-generated content. This inclusion of evidence enhances trustworthiness, as users can verify and explore the retrieved documents to validate model outputs. Quantifying uncertainty in RALM generations further improves trustworthiness, with retrieved text and confidence scores contributing to a comprehensive and reliable model for scientific applications. However, there is limited to no research on UQ for RALMs, particularly in scientific contexts. This study aims to address this gap by conducting a comprehensive evaluation of UQ in RALMs, focusing on scientific tasks. This research investigates how uncertainty scores vary when scientific knowledge is incorporated as pretraining and retrieval data and explores the relationship between uncertainty scores and the accuracy of model-generated outputs. We observe that an existing RALM finetuned with scientific knowledge as the retrieval data tends to be more confident in generating predictions compared to the model pretrained only with scientific knowledge. We also found that RALMs are overconfident in their predictions, making inaccurate predictions more confidently than accurate ones. Scientific knowledge provided either as pretraining or retrieval corpus does not help alleviate this issue. We released our code, data and dashboards at https://github.com/pnnl/EXPERT2. △ Less

Submitted 15 November, 2023; originally announced November 2023.

ACM Class: I.2.7

arXiv:2311.05592 [pdf, ps, other]

Fixed-point Grover Adaptive Search for Binary Optimization Problems

Authors: Ákos Nagy, Jaime Park, Cindy Zhang, Atithi Acharya, Alex Khan

Abstract: We study a Grover-type method for Quadratic Binary Optimization problems. In the unconstrained (QUBO) case, for an $n$-dimensional problem with $m$ nonzero terms, we construct a marker oracle for such problems with a tuneable parameter, $Λ\in \left[ 1, m \right] \cap \mathbb{Z}$. At $d \in \mathbb{Z}_+$ precision, the oracle uses $O \left( n + Λd \right)$ qubits, has total depth… ▽ More We study a Grover-type method for Quadratic Binary Optimization problems. In the unconstrained (QUBO) case, for an $n$-dimensional problem with $m$ nonzero terms, we construct a marker oracle for such problems with a tuneable parameter, $Λ\in \left[ 1, m \right] \cap \mathbb{Z}$. At $d \in \mathbb{Z}_+$ precision, the oracle uses $O \left( n + Λd \right)$ qubits, has total depth $O \left( \tfrac{m}Λ \log_2 \left( n \right) + \log_2 \left( d \right) \right)$, and non-Clifford depth of $O \left( \tfrac{m}Λ \right)$. Moreover, each qubit required to be connected to at most $O \left( \log_2 \left( Λ+ d \right) \right)$ other qubits. In the case of a maximal graph cuts, as $d = 2 \log_2 \left( n \right)$ always suffices, the depth of the marker oracle can be made as shallow as $O \left( \log_2 \left( n \right) \right)$. For all values of $Λ$, the non-Clifford gate count of these oracles is strictly lower (by a factor of $\sim 2$) than previous constructions. We then introduce a novel \emph{Fixed-point Grover Adaptive Search for QUBO Problems}, using our oracle design and a hybrid Fixed-point Grover Search of Li et al. This method has better performance guarantees than previous Grover Adaptive Search methods. Finally, we give a heuristic argument that, with high probability and in $O \left( \tfrac{\log_2 \left( n \right)}{\sqrtε} \right)$ time, this adaptive method finds a configuration that is among the best $ε2^n$ ones. △ Less

Submitted 16 May, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

Comments: 15 pages; Results substantially improved. Submitted. Comments are welcome!

arXiv:2311.04348 [pdf, other]

Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning

Authors: Sai Munikoti, Anurag Acharya, Sridevi Wagle, Sameera Horawalavithana

Abstract: Despite the dramatic progress in Large Language Model (LLM) development, LLMs often provide seemingly plausible but not factual information, often referred to as hallucinations. Retrieval-augmented LLMs provide a non-parametric approach to solve these issues by retrieving relevant information from external data sources and augment the training process. These models help to trace evidence from an e… ▽ More Despite the dramatic progress in Large Language Model (LLM) development, LLMs often provide seemingly plausible but not factual information, often referred to as hallucinations. Retrieval-augmented LLMs provide a non-parametric approach to solve these issues by retrieving relevant information from external data sources and augment the training process. These models help to trace evidence from an externally provided knowledge base allowing the model predictions to be better interpreted and verified. In this work, we critically evaluate these models in their ability to perform in scientific document reasoning tasks. To this end, we tuned multiple such model variants with science-focused instructions and evaluated them on a scientific document reasoning benchmark for the usefulness of the retrieved document passages. Our findings suggest that models justify predictions in science tasks with fabricated evidence and leveraging scientific corpus as pretraining data does not alleviate the risk of evidence fabrication. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 5 pages

ACM Class: I.2.7

arXiv:2311.00667 [pdf]

Development and application of SEM/EDS in biological, biomedical & nanotechnological research

Authors: Aniruddha Acharya

Abstract: This comprehensive review discusses the development of scanning electron microscopy and the application of this technology in different fields such as biology, nanobiotechnology and biomedical science. Besides being a tool for high resolution imaging of surface or topography, the technology is coupled with analytical techniques such as energy dispersive spectroscopy for elemental mapping. Since th… ▽ More This comprehensive review discusses the development of scanning electron microscopy and the application of this technology in different fields such as biology, nanobiotechnology and biomedical science. Besides being a tool for high resolution imaging of surface or topography, the technology is coupled with analytical techniques such as energy dispersive spectroscopy for elemental mapping. Since the commercialization of the technology, it has developed manifold and currently very high-resolution nano scale imaging is possible by this technology. The development of FIB-SEM has allowed three-dimensional imaging of materials while the development of cryostage allows imaging of hydrated biological samples. Though variable pressure or environmental SEM can be used for imaging hydrated samples, they cannot capture a high-resolution image. SBEM and ATUM-SEM has automated the sampling process while improved and more powerful software along with user-friendly computer interface has made image analysis faster and more reliable. This review presents one of the most widely used analytical techniques used across the globe for scientific investigation. The power and potential of SEM is expanding with the development of accessory technology. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: 32 pages, 5 figures, 1 table, unpublished work

arXiv:2311.00106 [pdf, other]

Variational principle for a damped, quadratically interacting particle chain with nonconservative forcing

Authors: Amit Acharya, Ambar N. Sengupta

Abstract: A method for designing variational principles for the dynamics of a possibly dissipative and non-conservatively forced chain of particles is demonstrated. Some qualitative features of the formulation are discussed. A method for designing variational principles for the dynamics of a possibly dissipative and non-conservatively forced chain of particles is demonstrated. Some qualitative features of the formulation are discussed. △ Less

Submitted 3 April, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

arXiv:2310.17688 [pdf, other]

doi 10.1126/science.adn0117

Managing extreme AI risks amid rapid progress

Authors: Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann

Abstract: Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI's impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although rese… ▽ More Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI's impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI, there is a lack of consensus about how exactly such risks arise, and how to manage them. Society's response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems. In this short consensus paper, we describe extreme risks from upcoming, advanced AI systems. Drawing on lessons learned from other safety-critical technologies, we then outline a comprehensive plan combining technical research and development with proactive, adaptive governance mechanisms for a more commensurate preparation. △ Less

Submitted 22 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: Published in Science: https://www.science.org/doi/10.1126/science.adn0117

arXiv:2310.15180 [pdf, other]

Lorentzian path integral in Kantowski-Sachs anisotropic cosmology

Authors: Saumya Ghosh, Arnab Acharya, Sunandan Gangopadhyay, Prasanta K. Panigrahi

Abstract: Motivated by the recent development in quantum cosmology, we revisit the anisotropic Kantowski-Sachs model in the light of a Lorentzian path integral formalism. Studies so far have considered the Euclidean method where the choice of the lapse integration contour is constrained by certain physical considerations rather than mathematical justification. In this paper, we have studied the Hartle-Hawki… ▽ More Motivated by the recent development in quantum cosmology, we revisit the anisotropic Kantowski-Sachs model in the light of a Lorentzian path integral formalism. Studies so far have considered the Euclidean method where the choice of the lapse integration contour is constrained by certain physical considerations rather than mathematical justification. In this paper, we have studied the Hartle-Hawking no-boundary proposal along with the use of Picard-Lefschetz theory in performing the lapse integration. In an isotropic limit, we show our results agree with the studies made in FLRW cosmology. We also observe that in the large scale structure the no-boundary proposal tends towards a conical singularity at the beginning of time. We have also performed a massless scalar perturbation analysis with no back reaction. This reveals that if there were any perturbation present at the beginning of the universe then that would flare up at the final boundary. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 9 pages, 4 figures

arXiv:2310.13401 [pdf, other]

Cosmic variance suppression in radiation-hydrodynamic modeling of the reionization-era 21-cm signal

Authors: Anshuman Acharya, Enrico Garaldi, Benedetta Ciardi, Qing-bo Ma

Abstract: The 21-cm line emitted by neutral hydrogen is the most promising probe of the Epoch of Reionization (EoR). Multiple radio interferometric instruments are on the cusp of detecting its power spectrum. It is therefore essential to deliver robust theoretical predictions, enabling sound inference of the coeval Universe properties. The nature of this signal traditionally required the modelling of… ▽ More The 21-cm line emitted by neutral hydrogen is the most promising probe of the Epoch of Reionization (EoR). Multiple radio interferometric instruments are on the cusp of detecting its power spectrum. It is therefore essential to deliver robust theoretical predictions, enabling sound inference of the coeval Universe properties. The nature of this signal traditionally required the modelling of $\mathcal{O}(10^{7-8} \, {\rm Mpc}^3)$ volumes to suppress the impact of cosmic variance. However, the recently-proposed Fixed & Paired (F&P) approach uses carefully-crafted simulation pairs to achieve equal results in smaller volumes. In this work, we thoroughly test the applicability of and improvement granted by this technique to different observables of the 21-cm signal from the EoR. We employ radiation-magneto-hydrodynamics simulations to ensure the most realistic physical description of this epoch, greatly improving over previous studies using a semi-numerical approach without accurate galaxy formation physics and radiative transfer. We estimate the statistical improvement granted by the F&P technique on predictions of the skewness, power spectrum, bispectrum and ionized regions size distribution of the 21-cm signal at redshift $7 \leq z \leq 10$ (corresponding to $\geq80\%$ of the gas being neutral). We find that the effective volume of F&P simulations is at least 3.5 times larger than traditional simulations. This directly translates into an equal improvement in the computational cost (in terms of time and memory). Finally, we confirm that a combination of different observables like skewness, power spectrum and bispectrum across different redshifts can be utilised to maximise the improvement. △ Less

Submitted 14 March, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

Comments: 13 pages, 11 figures, 2 tables. Accepted for publication in the Monthly Notices of the Royal Astronomical Society (MNRAS)

arXiv:2310.10920 [pdf, other]

NuclearQA: A Human-Made Benchmark for Language Models for the Nuclear Domain

Authors: Anurag Acharya, Sai Munikoti, Aaron Hellinger, Sara Smith, Sridevi Wagle, Sameera Horawalavithana

Abstract: As LLMs have become increasingly popular, they have been used in almost every field. But as the application for LLMs expands from generic fields to narrow, focused science domains, there exists an ever-increasing gap in ways to evaluate their efficacy in those fields. For the benchmarks that do exist, a lot of them focus on questions that don't require proper understanding of the subject in questi… ▽ More As LLMs have become increasingly popular, they have been used in almost every field. But as the application for LLMs expands from generic fields to narrow, focused science domains, there exists an ever-increasing gap in ways to evaluate their efficacy in those fields. For the benchmarks that do exist, a lot of them focus on questions that don't require proper understanding of the subject in question. In this paper, we present NuclearQA, a human-made benchmark of 100 questions to evaluate language models in the nuclear domain, consisting of a varying collection of questions that have been specifically designed by experts to test the abilities of language models. We detail our approach and show how the mix of several types of questions makes our benchmark uniquely capable of evaluating models in the nuclear domain. We also present our own evaluation metric for assessing LLM's performances due to the limitations of existing ones. Our experiments on state-of-the-art models suggest that even the best LLMs perform less than satisfactorily on our benchmark, demonstrating the scientific knowledge gap of existing LLMs. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 9 pages

ACM Class: I.2.7

arXiv:2310.03201 [pdf, other]

A Hidden Convexity in Continuum Mechanics, with application to classical, continuous-time, rate-(in)dependent plasticity

Authors: Amit Acharya

Abstract: A methodology for defining variational principles for a class of PDE models from continuum mechanics is demonstrated, and some of its features explored. The scheme is applied to quasi-static and dynamic models of rate-independent and rate-dependent, single crystal plasticity at finite deformation. A methodology for defining variational principles for a class of PDE models from continuum mechanics is demonstrated, and some of its features explored. The scheme is applied to quasi-static and dynamic models of rate-independent and rate-dependent, single crystal plasticity at finite deformation. △ Less

Submitted 28 June, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: This preprint has been classified as math.AP by arXiv moderators. This paper is to appear in Mathematics and Mechanics of Solids

arXiv:2309.12631 [pdf, other]

Learning the eigenstructure of quantum dynamics using classical shadows

Authors: Atithi Acharya, Siddhartha Saha, Shagesh Sridharan, Yanis Bahroun, Anirvan M. Sengupta

Abstract: Learning dynamics from repeated observation of the time evolution of an open quantum system, namely, the problem of quantum process tomography is an important task. This task is difficult in general, but, with some additional constraints could be tractable. This motivates us to look at the problem of Lindblad operator discovery from observations. We point out that for moderate size Hilbert spaces,… ▽ More Learning dynamics from repeated observation of the time evolution of an open quantum system, namely, the problem of quantum process tomography is an important task. This task is difficult in general, but, with some additional constraints could be tractable. This motivates us to look at the problem of Lindblad operator discovery from observations. We point out that for moderate size Hilbert spaces, low Kraus rank of the channel, and short time steps, the eigenvalues of the Choi matrix corresponding to the channel have a special structure. We use the least-square method for the estimation of a channel where, for fixed inputs, we estimate the outputs by classical shadows. The resultant noisy estimate of the channel can then be denoised by diagonalizing the nominal Choi matrix, truncating some eigenvalues, and altering it to a genuine Choi matrix. This processed Choi matrix is then compared to the original one. We see that as the number of samples increases, our reconstruction becomes more accurate. We also use tools from random matrix theory to understand the effect of estimation noise in the eigenspectrum of the estimated Choi matrix. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.01885 [pdf, other]

QuantEase: Optimization-based Quantization for Language Models

Authors: Kayhan Behdin, Ayan Acharya, Aman Gupta, Qingquan Song, Siyu Zhu, Sathiya Keerthi, Rahul Mazumder

Abstract: With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing from recent advances, our work introduces QuantEase, a layer-wise quantization framework where individual layers undergo separate quantization. The problem is f… ▽ More With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing from recent advances, our work introduces QuantEase, a layer-wise quantization framework where individual layers undergo separate quantization. The problem is framed as a discrete-structured non-convex optimization, prompting the development of algorithms rooted in Coordinate Descent (CD) techniques. These CD-based methods provide high-quality solutions to the complex non-convex layer-wise quantization problems. Notably, our CD-based approach features straightforward updates, relying solely on matrix and vector operations, circumventing the need for matrix inversion or decomposition. We also explore an outlier-aware variant of our approach, allowing for retaining significant weights (outliers) with complete precision. Our proposal attains state-of-the-art performance in terms of perplexity and zero-shot accuracy in empirical evaluations across various LLMs and datasets, with relative improvements up to 15% over methods such as GPTQ. Leveraging careful linear algebra optimizations, QuantEase can quantize models like Falcon-180B on a single NVIDIA A100 GPU in $\sim$3 hours. Particularly noteworthy is our outlier-aware algorithm's capability to achieve near or sub-3-bit quantization of LLMs with an acceptable drop in accuracy, obviating the need for non-uniform quantization or grouping techniques, improving upon methods such as SpQR by up to two times in terms of perplexity. △ Less

Submitted 1 December, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

arXiv:2308.14040 [pdf, other]

doi 10.1103/PhysRevE.108.064125

Tight-binding model subject to conditional resets at random times

Authors: Anish Acharya, Shamik Gupta

Abstract: We investigate the dynamics of a quantum system subjected to a time-dependent and conditional resetting protocol. Namely, we ask: what happens when the unitary evolution of the system is repeatedly interrupted at random time instants with an instantaneous reset to a specified set of reset configurations taking place with a probability that depends on the current configuration of the system at the… ▽ More We investigate the dynamics of a quantum system subjected to a time-dependent and conditional resetting protocol. Namely, we ask: what happens when the unitary evolution of the system is repeatedly interrupted at random time instants with an instantaneous reset to a specified set of reset configurations taking place with a probability that depends on the current configuration of the system at the instant of reset? Analyzing the protocol in the framework of the so-called tight-binding model describing the hopping of a quantum particle to nearest-neighbour sites in a one-dimensional open lattice, we obtain analytical results for the probability of finding the particle on the different sites of the lattice. We explore a variety of dynamical scenarios, including the one in which the resetting time intervals are sampled from an exponential as well as from a power-law distribution, and a set-up that includes a Floquet-type Hamiltonian involving an external periodic forcing. Under exponential resetting, and in both presence and absence of the external forcing, the system relaxes to a stationary state characterized by localization of the particle around the reset sites. The choice of the reset sites plays a defining role in dictating the relative probability of finding the particle at the reset sites as well as in determining the overall spatial profile of the site-occupation probability. Indeed, a simple choice can be engineered that makes the spatial profile highly asymmetric even when the bare dynamics does not involve the effect of any bias. Furthermore, analyzing the case of power-law resetting serves to demonstrate that the attainment of the stationary state in this quantum problem is not always evident and depends crucially on whether the distribution of reset time intervals has a finite or an infinite mean. △ Less

Submitted 19 December, 2023; v1 submitted 27 August, 2023; originally announced August 2023.

Comments: 23 pages, 8 figures; v2: close to published version, 25 pages, 8 figures

Journal ref: Phys. Rev. E 108, 064125 (2023)

arXiv:2308.02427 [pdf, other]

Unlocking the Potential of Similarity Matching: Scalability, Supervision and Pre-training

Authors: Yanis Bahroun, Shagesh Sridharan, Atithi Acharya, Dmitri B. Chklovskii, Anirvan M. Sengupta

Abstract: While effective, the backpropagation (BP) algorithm exhibits limitations in terms of biological plausibility, computational cost, and suitability for online learning. As a result, there has been a growing interest in developing alternative biologically plausible learning approaches that rely on local learning rules. This study focuses on the primarily unsupervised similarity matching (SM) framewor… ▽ More While effective, the backpropagation (BP) algorithm exhibits limitations in terms of biological plausibility, computational cost, and suitability for online learning. As a result, there has been a growing interest in developing alternative biologically plausible learning approaches that rely on local learning rules. This study focuses on the primarily unsupervised similarity matching (SM) framework, which aligns with observed mechanisms in biological systems and offers online, localized, and biologically plausible algorithms. i) To scale SM to large datasets, we propose an implementation of Convolutional Nonnegative SM using PyTorch. ii) We introduce a localized supervised SM objective reminiscent of canonical correlation analysis, facilitating stacking SM layers. iii) We leverage the PyTorch implementation for pre-training architectures such as LeNet and compare the evaluation of features against BP-trained models. This work combines biologically plausible algorithms with computational efficiency opening multiple avenues for further explorations. △ Less

Submitted 2 August, 2023; originally announced August 2023.

arXiv:2306.10616 [pdf, ps, other]

Action principles for dissipative, non-holonomic Newtonian mechanics

Authors: Amit Acharya, Ambar N. Sengupta

Abstract: A methodology for deriving dual variational principles for the classical Newtonian mechanics of mass points in the presence of applied forces, interaction forces, and constraints, all with a general dependence on particle velocities and positions, is presented. Methods for incorporating constraints are critically assessed. General theory, as well as explicitly worked out variational principles for… ▽ More A methodology for deriving dual variational principles for the classical Newtonian mechanics of mass points in the presence of applied forces, interaction forces, and constraints, all with a general dependence on particle velocities and positions, is presented. Methods for incorporating constraints are critically assessed. General theory, as well as explicitly worked out variational principles for a dissipative system (due to Lorenz) and a system with anholonomic constraints (due to Pars) are demonstrated. Conditions under which a (family of) dual Hamiltonian flow(s), as well as a constant(s) of motion, may be associated with a conservative or dissipative, and possibly constrained, primal system naturally emerge in this work. △ Less

Submitted 28 June, 2024; v1 submitted 18 June, 2023; originally announced June 2023.

Comments: to be published in Proceedings of the Royal Society A

arXiv:2306.09036 [pdf, other]

Localization transitions in non-Hermitian quasiperiodic lattice

Authors: Aruna Prasad Acharya, Sanjoy Datta

Abstract: The delocalization-localization (DL) transition in non-Hermitian systems exhibits intriguing features distinct from their Hermitian counterparts. In this study, we investigate the DL transition in a generalized non-Hermitian lattice with asymmetric hopping and complex quasi-periodic potential. Irrespective of the boundary conditions, the lattice undergoes a DL transition at a critical strength of… ▽ More The delocalization-localization (DL) transition in non-Hermitian systems exhibits intriguing features distinct from their Hermitian counterparts. In this study, we investigate the DL transition in a generalized non-Hermitian lattice with asymmetric hopping and complex quasi-periodic potential. Irrespective of the boundary conditions, the lattice undergoes a DL transition at a critical strength of the quasiperiodic potential with identical modulation of its real and complex parts. For periodic boundary conditions (PBC), we obtained an analytical expression that accurately predicts this critical point. Our numerical results indicate that the critical point remains the same with the open boundary condition (OBC) as well. Interestingly, we observe that a difference in the modulation of the real and the complex part of potential leads to a mixed phase that appears between the delocalized and the localized phases. Intriguingly, within the mixed state region, we observed a coexistence of skin modes and localized states in the case of OBC, while in the case of PBC, a mixed phase is created by a coexistence of delocalized and localized states. We mapped out the phase diagrams for different scenarios offering valuable insights into the role of different parameters in a wide class of non-Hermitian quasiperiodic lattices. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: 10 pages, 21 figures

arXiv:2306.03306 [pdf, other]

Tracking Evolving labels using Cone based Oracles

Authors: Aditya Acharya, David Mount

Abstract: The evolving data framework was first proposed by Anagnostopoulos et al., where an evolver makes small changes to a structure behind the scenes. Instead of taking a single input and producing a single output, an algorithm judiciously probes the current state of the structure and attempts to continuously maintain a sketch of the structure that is as close as possible to its actual state. There have… ▽ More The evolving data framework was first proposed by Anagnostopoulos et al., where an evolver makes small changes to a structure behind the scenes. Instead of taking a single input and producing a single output, an algorithm judiciously probes the current state of the structure and attempts to continuously maintain a sketch of the structure that is as close as possible to its actual state. There have been a number of problems that have been studied in the evolving framework including our own work on labeled trees. We were motivated by the problem of maintaining a labeling in the plane, where updating the labels require physically moving them. Applications involve tracking evolving disease hot-spots via mobile testing units , and tracking unmanned aerial vehicles. To be specific, we consider the problem of tracking labeled nodes in the plane, where an evolver continuously swaps labels of any two nearby nodes in the background unknown to us. We are tasked with maintaining a hypothesis, an approximate sketch of the locations of these labels, which we can only update by physically moving them over a sparse graph. We assume the existence of an Oracle, which when suitably probed, guides us in fixing our hypothesis. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: This is an abstract of a presentation given at CG:YRF 2023. It has been made public for the benefit of the community and should be considered a preprint rather than a formally reviewed paper. Thus, this work is expected to appear in a conference with formal proceedings and/or in a journal

arXiv:2305.16454 [pdf, other]

Modeling of experimentally observed topological defects inside bulk polycrystals

Authors: Siddharth Singh, He Liu, Rajat Arora, Robert M. Suter, Amit Acharya

Abstract: A rigorous methodology is developed for computing elastic fields generated by experimentally observed defect structures within grains in a polycrystal that has undergone tensile extension. An example application is made using a near-field High Energy X-ray Diffraction Microscope measurement of a zirconium sample that underwent $13.6\%$ tensile extension from an initially well-annealed state. (Sub)… ▽ More A rigorous methodology is developed for computing elastic fields generated by experimentally observed defect structures within grains in a polycrystal that has undergone tensile extension. An example application is made using a near-field High Energy X-ray Diffraction Microscope measurement of a zirconium sample that underwent $13.6\%$ tensile extension from an initially well-annealed state. (Sub)grain boundary features are identified with apparent disclination line defects in them. The elastic fields of these features identified from the experiment are calculated. △ Less

Submitted 7 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

arXiv:2304.09418 [pdf, other]

Hidden convexity in the heat, linear transport, and Euler's rigid body equations: A computational approach

Authors: Uditnarayan Kouskiya, Amit Acharya

Abstract: A finite element based computational scheme is developed and employed to assess a duality based variational approach to the solution of the linear heat and transport PDE in one space dimension and time, and the nonlinear system of ODEs of Euler for the rotation of a rigid body about a fixed point. The formulation turns initial-(boundary) value problems into degenerate elliptic boundary value probl… ▽ More A finite element based computational scheme is developed and employed to assess a duality based variational approach to the solution of the linear heat and transport PDE in one space dimension and time, and the nonlinear system of ODEs of Euler for the rotation of a rigid body about a fixed point. The formulation turns initial-(boundary) value problems into degenerate elliptic boundary value problems in (space)-time domains representing the Euler-Lagrange equations of suitably designed dual functionals in each of the above problems. We demonstrate reasonable success in approximating solutions of this range of parabolic, hyperbolic, and ODE primal problems, which includes energy dissipation as well as conservation, by a unified dual strategy lending itself to a variational formulation. The scheme naturally associates a family of dual solutions to a unique primal solution; such `gauge invariance' is demonstrated in our computed solutions of the heat and transport equations, including the case of a transient dual solution corresponding to a steady primal solution of the heat equation. Primal evolution problems with causality are shown to be correctly approximated by non-causal dual problems. △ Less

Submitted 8 October, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

arXiv:2304.02464 [pdf, other]

Interface-dominated plasticity and kink bands in metallic nanolaminates

Authors: Abhishek Arora, Rajat Arora, Amit Acharya

Abstract: The theoretical and computational framework of finite deformation mesoscale field dislocation mechanics (MFDM) is used to understand the salient aspects of kink-band formation in Cu-Nb nano-metallic laminates (NMLs). A conceptually minimal, plane-strain idealization of the three-dimensional geometry, including crystalline orientation, of additively manufactured NML is used to model NMLs. Important… ▽ More The theoretical and computational framework of finite deformation mesoscale field dislocation mechanics (MFDM) is used to understand the salient aspects of kink-band formation in Cu-Nb nano-metallic laminates (NMLs). A conceptually minimal, plane-strain idealization of the three-dimensional geometry, including crystalline orientation, of additively manufactured NML is used to model NMLs. Importantly, the natural jump/interface condition of MFDM imposing continuity of (certain components) of plastic strain rates across interfaces allows theory-driven `communication' of plastic flow across the laminate boundaries in our finite element implementation. Kink bands under layer parallel compression of NMLs in accord with experimental observations arise in our numerical simulations. The possible mechanisms for the formation and orientation of kink bands are discussed, within the scope of our idealized framework. We also report results corresponding to various parametric studies that provide preliminary insights and clear questions for future work on understanding the intricate underlying mechanisms for the formation of kink bands. △ Less

Submitted 8 May, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

Comments: Keywords: mesoscale plasticity, kink bands, nanometallic laminates, strain gradient plasticity

arXiv:2303.09458 [pdf]

doi 10.1016/j.jmr.2023.107478

Simulation and design of shaped pulses beyond the piecewise-constant approximation

Authors: Uluk Rasulov, Anupama Acharya, Marina Carravetta, Guinevere Mathies, Ilya Kuprov

Abstract: Response functions of resonant circuits create ringing artefacts if their input changes rapidly. When physical limits of electromagnetic spectroscopies are explored, this creates two types of problems. Firstly, simulation: the system must be propagated accurately through every response transient, this may be computationally expensive. Secondly, optimal control: circuit response must be taken into… ▽ More Response functions of resonant circuits create ringing artefacts if their input changes rapidly. When physical limits of electromagnetic spectroscopies are explored, this creates two types of problems. Firstly, simulation: the system must be propagated accurately through every response transient, this may be computationally expensive. Secondly, optimal control: circuit response must be taken into account; it may be advantageous to design pulses that are resilient to such distortions. At the root of both problems is the popular piecewise-constant approximation for control sequences in the rotating frame; in magnetic resonance it has persisted since the earliest days and has become entrenched in the commercially available hardware. In this paper, we report an implementation and benchmarks of recent Lie-group methods that can efficiently simulate and optimise smooth control sequences. △ Less

Submitted 6 May, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

arXiv:2303.06269 [pdf, other]

DEPLOYR: A technical framework for deploying custom real-time machine learning models into the electronic medical record

Authors: Conor K. Corbin, Rob Maclay, Aakash Acharya, Sreedevi Mony, Soumya Punnathanam, Rahul Thapa, Nikesh Kotecha, Nigam H. Shah, Jonathan H. Chen

Abstract: Machine learning (ML) applications in healthcare are extensively researched, but successful translations to the bedside are scant. Healthcare institutions are establishing frameworks to govern and promote the implementation of accurate, actionable and reliable models that integrate with clinical workflow. Such governance frameworks require an accompanying technical framework to deploy models in a… ▽ More Machine learning (ML) applications in healthcare are extensively researched, but successful translations to the bedside are scant. Healthcare institutions are establishing frameworks to govern and promote the implementation of accurate, actionable and reliable models that integrate with clinical workflow. Such governance frameworks require an accompanying technical framework to deploy models in a resource efficient manner. Here we present DEPLOYR, a technical framework for enabling real-time deployment and monitoring of researcher created clinical ML models into a widely used electronic medical record (EMR) system. We discuss core functionality and design decisions, including mechanisms to trigger inference based on actions within EMR software, modules that collect real-time data to make inferences, mechanisms that close-the-loop by displaying inferences back to end-users within their workflow, monitoring modules that track performance of deployed models over time, silent deployment capabilities, and mechanisms to prospectively evaluate a deployed model's impact. We demonstrate the use of DEPLOYR by silently deploying and prospectively evaluating twelve ML models triggered by clinician button-clicks in Stanford Health Care's production instance of Epic. Our study highlights the need and feasibility for such silent deployment, because prospectively measured performance varies from retrospective estimates. By describing DEPLOYR, we aim to inform ML deployment best practices and help bridge the model implementation gap. △ Less

Submitted 10 March, 2023; originally announced March 2023.

arXiv:2302.09693 [pdf, other]

mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

Authors: Kayhan Behdin, Qingquan Song, Aman Gupta, Sathiya Keerthi, Ayan Acharya, Borja Ocejo, Gregory Dexter, Rajiv Khanna, David Durfee, Rahul Mazumder

Abstract: Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as… ▽ More Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as micro-batch SAM (mSAM). This variation involves aggregating updates derived from adversarial perturbations across multiple shards (micro-batches) of a mini-batch during training. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM. We provide a thorough empirical evaluation of various image classification and natural language processing tasks to substantiate this theoretical advancement. We also show that contrary to previous work, mSAM can be implemented in a flexible and parallelizable manner without significantly increasing computational costs. Our implementation of mSAM yields superior generalization performance across a wide range of tasks compared to SAM, further supporting our theoretical framework. △ Less

Submitted 30 September, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2212.04343

arXiv:2302.08669 [pdf, other]

Learning to Forecast Aleatoric and Epistemic Uncertainties over Long Horizon Trajectories

Authors: Aastha Acharya, Rebecca Russell, Nisar R. Ahmed

Abstract: Giving autonomous agents the ability to forecast their own outcomes and uncertainty will allow them to communicate their competencies and be used more safely. We accomplish this by using a learned world model of the agent system to forecast full agent trajectories over long time horizons. Real world systems involve significant sources of both aleatoric and epistemic uncertainty that compound and i… ▽ More Giving autonomous agents the ability to forecast their own outcomes and uncertainty will allow them to communicate their competencies and be used more safely. We accomplish this by using a learned world model of the agent system to forecast full agent trajectories over long time horizons. Real world systems involve significant sources of both aleatoric and epistemic uncertainty that compound and interact over time in the trajectory forecasts. We develop a deep generative world model that quantifies aleatoric uncertainty while incorporating the effects of epistemic uncertainty during the learning process. We show on two reinforcement learning problems that our uncertainty model produces calibrated outcome uncertainty estimates over the full trajectory horizon. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: Accepted to ICRA 2023

arXiv:2301.09353 [pdf, other]

doi 10.1007/s00332-023-09939-5

Vector Field Models for Nematic Disclinations

Authors: Amit Acharya, Irene Fonseca, Likhit Ganedi, Kerrek Stinson

Abstract: In this paper, a model for defects that was introduced in \cite{ZANV} is studied. In the literature, the setting of most models for defects is the function space SBV (special bounded variation functions) (see, e.g., \cite{ContiGarroni, GoldmanSerfaty}). However, this model regularizes the director field to be in a Sobolev space by adding a second field to incorporate the defect. A relaxation resul… ▽ More In this paper, a model for defects that was introduced in \cite{ZANV} is studied. In the literature, the setting of most models for defects is the function space SBV (special bounded variation functions) (see, e.g., \cite{ContiGarroni, GoldmanSerfaty}). However, this model regularizes the director field to be in a Sobolev space by adding a second field to incorporate the defect. A relaxation result in the case of fixed parameters is proven along with some partial compactness results. △ Less

Submitted 23 January, 2023; originally announced January 2023.

arXiv:2301.05384

AAAI 2022 Fall Symposium: Lessons Learned for Autonomous Assessment of Machine Abilities (LLAAMA)

Authors: Nicholas Conlon, Aastha Acharya, Nisar Ahmed

Abstract: Modern civilian and military systems have created a demand for sophisticated intelligent autonomous machines capable of operating in uncertain dynamic environments. Such systems are realizable thanks in large part to major advances in perception and decision-making techniques, which in turn have been propelled forward by modern machine learning tools. However, these newer forms of intelligent auto… ▽ More Modern civilian and military systems have created a demand for sophisticated intelligent autonomous machines capable of operating in uncertain dynamic environments. Such systems are realizable thanks in large part to major advances in perception and decision-making techniques, which in turn have been propelled forward by modern machine learning tools. However, these newer forms of intelligent autonomy raise questions about when/how communication of the operational intent and assessments of actual vs. supposed capabilities of autonomous agents impact overall performance. This symposium examines the possibilities for enabling intelligent autonomous systems to self-assess and communicate their ability to effectively execute assigned tasks, as well as reason about the overall limits of their competencies and maintain operability within those limits. The symposium brings together researchers working in this burgeoning area of research to share lessons learned, identify major theoretical and practical challenges encountered so far, and potential avenues for future research and real-world applications. △ Less

Submitted 12 January, 2023; originally announced January 2023.

arXiv:2212.05975 [pdf, other]

GenSyn: A Multi-stage Framework for Generating Synthetic Microdata using Macro Data Sources

Authors: Angeela Acharya, Siddhartha Sikdar, Sanmay Das, Huzefa Rangwala

Abstract: Individual-level data (microdata) that characterizes a population, is essential for studying many real-world problems. However, acquiring such data is not straightforward due to cost and privacy constraints, and access is often limited to aggregated data (macro data) sources. In this study, we examine synthetic data generation as a tool to extrapolate difficult-to-obtain high-resolution data by co… ▽ More Individual-level data (microdata) that characterizes a population, is essential for studying many real-world problems. However, acquiring such data is not straightforward due to cost and privacy constraints, and access is often limited to aggregated data (macro data) sources. In this study, we examine synthetic data generation as a tool to extrapolate difficult-to-obtain high-resolution data by combining information from multiple easier-to-obtain lower-resolution data sources. In particular, we introduce a framework that uses a combination of univariate and multivariate frequency tables from a given target geographical location in combination with frequency tables from other auxiliary locations to generate synthetic microdata for individuals in the target location. Our method combines the estimation of a dependency graph and conditional probabilities from the target location with the use of a Gaussian copula to leverage the available information from the auxiliary locations. We perform extensive testing on two real-world datasets and demonstrate that our approach outperforms prior approaches in preserving the overall dependency structure of the data while also satisfying the constraints defined on the different variables. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: 10 pages, 6 figures, Accepted for the 2022 IEEE International Conference on Big Data

arXiv:2212.04343 [pdf, other]

Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization

Authors: Kayhan Behdin, Qingquan Song, Aman Gupta, David Durfee, Ayan Acharya, Sathiya Keerthi, Rahul Mazumder

Abstract: Modern deep learning models are over-parameterized, where the optimization setup strongly affects the generalization performance. A key element of reliable optimization for these systems is the modification of the loss function. Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima, which arguably have better generalization abiliti… ▽ More Modern deep learning models are over-parameterized, where the optimization setup strongly affects the generalization performance. A key element of reliable optimization for these systems is the modification of the loss function. Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima, which arguably have better generalization abilities. In this paper, we focus on a variant of SAM known as mSAM, which, during training, averages the updates generated by adversarial perturbations across several disjoint shards of a mini-batch. Recent work suggests that mSAM can outperform SAM in terms of test accuracy. However, a comprehensive empirical study of mSAM is missing from the literature -- previous results have mostly been limited to specific architectures and datasets. To that end, this paper presents a thorough empirical evaluation of mSAM on various tasks and datasets. We provide a flexible implementation of mSAM and compare the generalization performance of mSAM to the performance of SAM and vanilla training on different image classification and natural language processing tasks. We also conduct careful experiments to understand the computational cost of training with mSAM, its sensitivity to hyperparameters and its correlation with the flatness of the loss landscape. Our analysis reveals that mSAM yields superior generalization performance and flatter minima, compared to SAM, across a wide range of tasks without significantly increasing computational costs. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Showing 1–50 of 184 results for author: Acharya, A