Search | arXiv e-print repository

Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR

Authors: Abhishek Gupta, Amruta Parulekar, Sameep Chattopadhyay, Preethi Jyothi

Abstract: Automatic speech recognition (ASR) for low-resource languages remains a challenge due to the scarcity of labeled training data. Parameter-efficient fine-tuning and text-only adaptation are two popular methods that have been used to address such low-resource settings. In this work, we investigate how these techniques can be effectively combined using a multilingual multimodal model like SeamlessM4T… ▽ More Automatic speech recognition (ASR) for low-resource languages remains a challenge due to the scarcity of labeled training data. Parameter-efficient fine-tuning and text-only adaptation are two popular methods that have been used to address such low-resource settings. In this work, we investigate how these techniques can be effectively combined using a multilingual multimodal model like SeamlessM4T. Multimodal models are able to leverage unlabeled text via text-only adaptation with further parameter-efficient ASR fine-tuning, thus boosting ASR performance. We also show cross-lingual transfer from a high-resource language, achieving up to a relative 17% WER reduction over a baseline in a zero-shot setting without any labeled speech. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2410.12890 [pdf, other]

REFINE on Scarce Data: Retrieval Enhancement through Fine-Tuning via Model Fusion of Embedding Models

Authors: Ambuje Gupta, Mrinal Rawat, Andreas Stolcke, Roberto Pieraccini

Abstract: Retrieval augmented generation (RAG) pipelines are commonly used in tasks such as question-answering (QA), relying on retrieving relevant documents from a vector store computed using a pretrained embedding model. However, if the retrieved context is inaccurate, the answers generated using the large language model (LLM) may contain errors or hallucinations. Although pretrained embedding models have… ▽ More Retrieval augmented generation (RAG) pipelines are commonly used in tasks such as question-answering (QA), relying on retrieving relevant documents from a vector store computed using a pretrained embedding model. However, if the retrieved context is inaccurate, the answers generated using the large language model (LLM) may contain errors or hallucinations. Although pretrained embedding models have advanced, adapting them to new domains remains challenging. Fine-tuning is a potential solution, but industry settings often lack the necessary fine-tuning data. To address these challenges, we propose REFINE, a novel technique that generates synthetic data from available documents and then uses a model fusion approach to fine-tune embeddings for improved retrieval performance in new domains, while preserving out-of-domain capability. We conducted experiments on the two public datasets: SQUAD and RAG-12000 and a proprietary TOURISM dataset. Results demonstrate that even the standard fine-tuning with the proposed data augmentation technique outperforms the vanilla pretrained model. Furthermore, when combined with model fusion, the proposed approach achieves superior performance, with a 5.76% improvement in recall on the TOURISM dataset, and 6.58 % and 0.32% enhancement on SQUAD and RAG-12000 respectively. △ Less

Submitted 16 October, 2024; originally announced October 2024.

Comments: Accepted in AJCAI'24

arXiv:2410.11954 [pdf, other]

DAXA: Traversing the X-ray desert by Democratising Archival X-ray Astronomy

Authors: David J. Turner, Jessica E. Pilling, Megan Donahue, Paul A. Giles, Kathy Romer, Agrim Gupta, Toby Wallage, Ray Wang

Abstract: We introduce a new, open-source, Python module for the acquisition and processing of archival data from many X-ray telescopes - Democratising Archival X-ray Astronomy (hereafter referred to as DAXA). Our software is built to increase access to, and use of, large archives of X-ray astronomy data; providing a unified, easy-to-use, Python interface to the disparate archives and processing tools. We p… ▽ More We introduce a new, open-source, Python module for the acquisition and processing of archival data from many X-ray telescopes - Democratising Archival X-ray Astronomy (hereafter referred to as DAXA). Our software is built to increase access to, and use of, large archives of X-ray astronomy data; providing a unified, easy-to-use, Python interface to the disparate archives and processing tools. We provide this interface for the majority of X-ray telescopes launched within the last 30 years. This module enables much greater access to X-ray data for non-specialists, while preserving low-level control of processing for X-ray experts. It is useful for identifying relevant observations of a single object of interest but it excels at creating multi-mission datasets for serendipitous or targeted studies of large samples of X-ray emitting objects. The management and organization of datasets is also made easier; DAXA archives can be version controlled and updated if new data become available. Once relevant observations are identified, the raw data can be downloaded (and optionally processed) through DAXA, or pre-processed event lists, images, and exposure maps can be downloaded if they are available. X-ray observations are perfectly suited to serendipitous discoveries and archival analyses, and with a decade-long `X-ray desert' potentially on the horizon archival data will take on even greater importance; enhanced access to those archives will be vital to the continuation of X-ray astronomy. △ Less

Submitted 15 October, 2024; originally announced October 2024.

Comments: 5 pages, 1 figure, submitted to JOSS; GitHub repository - https://github.com/DavidT3/DAXA; Documentation - https://daxa.readthedocs.io/

arXiv:2410.11471 [pdf, other]

Unbiased estimation of second-order parameter sensitivities for stochastic reaction networks

Authors: Quentin Badolle, Ankit Gupta, Mustafa Khammash

Abstract: Stochastic models for chemical reaction networks are increasingly popular in systems and synthetic biology. These models formulate the reaction dynamics as Continuous-Time Markov Chains (CTMCs) whose propensities are parameterized by a vector $θ$ and parameter sensitivities are introduced as derivatives of their expected outputs with respect to components of the parameter vector. Sensitivities cha… ▽ More Stochastic models for chemical reaction networks are increasingly popular in systems and synthetic biology. These models formulate the reaction dynamics as Continuous-Time Markov Chains (CTMCs) whose propensities are parameterized by a vector $θ$ and parameter sensitivities are introduced as derivatives of their expected outputs with respect to components of the parameter vector. Sensitivities characterise key properties of the output like robustness and are also at the heart of numerically efficient optimisation routines like Newton-type algorithms used in parameter inference and the design of of control mechanisms. Currently the only unbiased estimator for second-order sensitivities is based on the Girsanov transform and it often suffers from high estimator variance. We develop a novel estimator for second-order sensitivities by first rigorously deriving an integral representation of these sensitivities. We call the resulting method the Double Bernoulli Path Algorithm and illustrate its efficiency through numerical examples. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.11079 [pdf]

Code-Mixer Ya Nahi: Novel Approaches to Measuring Multilingual LLMs' Code-Mixing Capabilities

Authors: Ayushman Gupta, Akhil Bhogal, Kripabandhu Ghosh

Abstract: Multilingual Large Language Models (LLMs) have demonstrated exceptional performance in Machine Translation (MT) tasks. However, their MT abilities in the context of code-switching (the practice of mixing two or more languages in an utterance) remain under-explored. In this paper, we introduce Rule-Based Prompting, a novel prompting technique to generate code-mixed sentences. We measure and compare… ▽ More Multilingual Large Language Models (LLMs) have demonstrated exceptional performance in Machine Translation (MT) tasks. However, their MT abilities in the context of code-switching (the practice of mixing two or more languages in an utterance) remain under-explored. In this paper, we introduce Rule-Based Prompting, a novel prompting technique to generate code-mixed sentences. We measure and compare the code-mixed MT abilities of 3 popular multilingual LLMs: GPT-3.5-turbo, GPT-4, and Gemini Pro across five language pairs: English-{Hindi, Bengali, Gujarati, French, Spanish} using $k$-shot prompting ($k\in\{0, 1, 10, 20\}$) and Rule-Based Prompting. Our findings suggest that though $k$-shot prompting often leads to the best results, Rule-Based prompting shows promise in generating unique code-mixed sentences that vary in their style of code-mixing. We also use $k$-shot prompting to gauge the code-mixed to English translation abilities of multilingual LLMs. For this purpose, we create a gold-standard code-mixed dataset spanning five language pairs: English-{Hindi, Bengali, Gujarati, French, Spanish}. As a real-world application of our work, we create a code-mixed chatbot. △ Less

Submitted 14 October, 2024; originally announced October 2024.

Comments: Manuscript submitted to COLING 2025

arXiv:2410.10580 [pdf]

Multilingual Controlled Generation And Gold-Standard-Agnostic Evaluation of Code-Mixed Sentences

Authors: Ayushman Gupta, Akhil Bhogal, Kripabandhu Ghosh

Abstract: Code-mixing, the practice of alternating between two or more languages in an utterance, is a common phenomenon in multilingual communities. Due to the colloquial nature of code-mixing, there is no singular correct way to translate an English sentence into a code-mixed sentence. For this reason, standard n-gram-based MT evaluation metrics such as the BLEU score are not appropriate for code-mixed ev… ▽ More Code-mixing, the practice of alternating between two or more languages in an utterance, is a common phenomenon in multilingual communities. Due to the colloquial nature of code-mixing, there is no singular correct way to translate an English sentence into a code-mixed sentence. For this reason, standard n-gram-based MT evaluation metrics such as the BLEU score are not appropriate for code-mixed evaluation. To demonstrate this, we propose a novel method for code-mixed text generation: Controlled Generation, which parameterizes the code-mixing degree (CMD) and enables the generation of multiple semantically equivalent code-mixed sentences from a given English sentence. We introduce a robust new evaluation metric: GAME: A Gold-Standard Agnostic Measure for Evaluation of Code-Mixed Sentences. GAME is both language-agnostic and gold-standard-agnostic, i.e. unlike other metrics, GAME does not require gold-standard code-mixed sentences for evaluation, thus eliminating the need for human annotators in the code-mixed evaluation process. When used to evaluate semantically equivalent code-mixed sentences, we find that GAME scores have a lower standard deviation than BLEU scores. Further, we create and release a dataset containing gold-standard code-mixed sentences across 4 language pairs: English-{Hindi, Bengali, French, Spanish} to encourage more computational research on code-mixing. △ Less

Submitted 14 October, 2024; originally announced October 2024.

Comments: Manuscript submitted to COLING 2025

arXiv:2410.10474 [pdf, other]

European Option Pricing in Regime Switching Framework via Physics-Informed Residual Learning

Authors: Naman Krishna Pande, Puneet Pasricha, Arun Kumar, Arvind Kumar Gupta

Abstract: In this article, we employ physics-informed residual learning (PIRL) and propose a pricing method for European options under a regime-switching framework, where closed-form solutions are not available. We demonstrate that the proposed approach serves an efficient alternative to competing pricing techniques for regime-switching models in the literature. Specifically, we demonstrate that PIRLs elimi… ▽ More In this article, we employ physics-informed residual learning (PIRL) and propose a pricing method for European options under a regime-switching framework, where closed-form solutions are not available. We demonstrate that the proposed approach serves an efficient alternative to competing pricing techniques for regime-switching models in the literature. Specifically, we demonstrate that PIRLs eliminate the need for retraining and become nearly instantaneous once trained, thus, offering an efficient and flexible tool for pricing options across a broad range of specifications and parameters. △ Less

Submitted 14 October, 2024; originally announced October 2024.

arXiv:2410.09151 [pdf, other]

A search using GEO600 for gravitational waves coincident with fast radio bursts from SGR 1935+2154

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, I. Abouelfettouh, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, D. Agarwal, M. Agathos, M. Aghaei Abchouyeh, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Al-Jodah, C. Alléné , et al. (1758 additional authors not shown)

Abstract: The magnetar SGR 1935+2154 is the only known Galactic source of fast radio bursts (FRBs). FRBs from SGR 1935+2154 were first detected by CHIME/FRB and STARE2 in 2020 April, after the conclusion of the LIGO, Virgo, and KAGRA Collaborations' O3 observing run. Here we analyze four periods of gravitational wave (GW) data from the GEO600 detector coincident with four periods of FRB activity detected by… ▽ More The magnetar SGR 1935+2154 is the only known Galactic source of fast radio bursts (FRBs). FRBs from SGR 1935+2154 were first detected by CHIME/FRB and STARE2 in 2020 April, after the conclusion of the LIGO, Virgo, and KAGRA Collaborations' O3 observing run. Here we analyze four periods of gravitational wave (GW) data from the GEO600 detector coincident with four periods of FRB activity detected by CHIME/FRB, as well as X-ray glitches and X-ray bursts detected by NICER and NuSTAR close to the time of one of the FRBs. We do not detect any significant GW emission from any of the events. Instead, using a short-duration GW search (for bursts $\leq$ 1 s) we derive 50\% (90\%) upper limits of $10^{48}$ ($10^{49}$) erg for GWs at 300 Hz and $10^{49}$ ($10^{50}$) erg at 2 kHz, and constrain the GW-to-radio energy ratio to $\leq 10^{14} - 10^{16}$. We also derive upper limits from a long-duration search for bursts with durations between 1 and 10 s. These represent the strictest upper limits on concurrent GW emission from FRBs. △ Less

Submitted 11 October, 2024; originally announced October 2024.

Comments: 15 pages of text including references, 4 figures, 5 tables

Report number: LIGO-P2400192

arXiv:2410.08943 [pdf, other]

BD+44 493: Chemo-Dynamical Analysis and Constraints on Companion Planetary Masses from WIYN/NEID Spectroscopy

Authors: Vinicius M. Placco, Arvind F. Gupta, Felipe Almeida-Fernandes, Sarah E. Logsdon, Jayadev Rajagopal, Erika M. Holmbeck, Ian U. Roederer, John Della Costa, Pipa Fernandez, Eli Golub, Jesus Higuera, Yatrik Patel, Susan Ridgway, Heidi Schweiker

Abstract: In this work, we present high-resolution (R~100,000), high signal-to-noise (S/N~800) spectroscopic observations for the well-known, bright, extremely metal-poor, carbon-enhanced star BD+44 493. We determined chemical abundances and upper limits for 17 elements from WIYN/NEID data, complemented with 11 abundances re-determined from Subaru and Hubble data, using the new, more accurate, stellar atmos… ▽ More In this work, we present high-resolution (R~100,000), high signal-to-noise (S/N~800) spectroscopic observations for the well-known, bright, extremely metal-poor, carbon-enhanced star BD+44 493. We determined chemical abundances and upper limits for 17 elements from WIYN/NEID data, complemented with 11 abundances re-determined from Subaru and Hubble data, using the new, more accurate, stellar atmospheric parameters calculated in this work. Our analysis suggests that BD+44 493 is a low-mass (0.83Msun) old (12.1-13.2Gyr) second-generation star likely formed from a gas cloud enriched by a single metal-free 20.5Msun Population III star in the early Universe. With a disk-like orbit, BD+44 493 does not appear to be associated with any major merger event in the early history of the Milky Way. From the precision radial-velocity NEID measurements (median absolute deviation - MAD=16m/s), we were able to constrain companion planetary masses around BD+44 493 and rule out the presence of planets as small as msin(i)=2MJ out to periods of 100 days. This study opens a new avenue of exploration for the intersection between stellar archaeology and exoplanet science using NEID. △ Less

Submitted 11 October, 2024; originally announced October 2024.

Comments: 25 pages, 11 figures, accepted for publication on ApJ

arXiv:2410.07168 [pdf, other]

Sylber: Syllabic Embedding Representation of Speech from Raw Audio

Authors: Cheol Jun Cho, Nicholas Lee, Akshat Gupta, Dhruv Agarwal, Ethan Chen, Alan W Black, Gopala K. Anumanchipalli

Abstract: Syllables are compositional units of spoken language that play a crucial role in human speech perception and production. However, current neural speech representations lack structure, resulting in dense token sequences that are costly to process. To bridge this gap, we propose a new model, Sylber, that produces speech representations with clean and robust syllabic structure. Specifically, we propo… ▽ More Syllables are compositional units of spoken language that play a crucial role in human speech perception and production. However, current neural speech representations lack structure, resulting in dense token sequences that are costly to process. To bridge this gap, we propose a new model, Sylber, that produces speech representations with clean and robust syllabic structure. Specifically, we propose a self-supervised model that regresses features on syllabic segments distilled from a teacher model which is an exponential moving average of the model in training. This results in a highly structured representation of speech features, offering three key benefits: 1) a fast, linear-time syllable segmentation algorithm, 2) efficient syllabic tokenization with an average of 4.27 tokens per second, and 3) syllabic units better suited for lexical and syntactic understanding. We also train token-to-speech generative models with our syllabic units and show that fully intelligible speech can be reconstructed from these tokens. Lastly, we observe that categorical perception, a linguistic phenomenon of speech perception, emerges naturally in our model, making the embedding space more categorical and sparse than previous self-supervised learning approaches. Together, we present a novel self-supervised approach for representing speech as syllables, with significant potential for efficient speech tokenization and spoken language modeling. △ Less

Submitted 9 October, 2024; originally announced October 2024.

arXiv:2410.03784 [pdf, other]

Emission-Line Ratios and Ionization Conditions of CEERS Star-Forming Galaxies with JWST/NIRSpec

Authors: Ansh R. Gupta, Allison Kirkpatrick, Vital Fernandez, Pablo Arrabal Haro, Bren E. Backhaus, Nikko J. Cleri, Norman A. Grogin, Anton M. Koekemoer

Abstract: Galaxy emission-line fluxes can be analyzed to determine star formation rates (SFR) and ISM ionization. Here, we investigate rest-frame optical emission lines of 71 star-forming galaxies at redshift 0.7 < z < 7 from the Cosmic Evolution Early Release Science (CEERS) survey using JWST/NIRSpec. We use H$α$ line fluxes to measure SFRs. We combine these with HST CANDELS stellar mass estimates to deter… ▽ More Galaxy emission-line fluxes can be analyzed to determine star formation rates (SFR) and ISM ionization. Here, we investigate rest-frame optical emission lines of 71 star-forming galaxies at redshift 0.7 < z < 7 from the Cosmic Evolution Early Release Science (CEERS) survey using JWST/NIRSpec. We use H$α$ line fluxes to measure SFRs. We combine these with HST CANDELS stellar mass estimates to determine the redshift evolution of specific SFR (sSFR) and compare our sample with the star-forming galaxy main sequence. We create [O III]$λ$5008/H$β$ versus [Ne III]$λ$3870/[O II]$λ$3728 line ratio diagrams and correlate these ratios with sSFR and the distance of each galaxy from the main sequence (excess sSFR). We find a modest correlation between the line ratios and sSFR, which is consistent with previous work analyzing similar samples. However, we find a weak correlation between the line ratios and excess sSFR. Taken together, our results suggest that sSFR is the parameter that governs ionization conditions rather than SFR or a galaxy's distance from the main sequence. These measurements reveal a rich diversity of ISM conditions and physical galaxy properties throughout cosmic time. △ Less

Submitted 3 October, 2024; originally announced October 2024.

Comments: 6 pages, 1 figure. Submitted to Research Notes of the American Astronomical Society

arXiv:2410.02294 [pdf, other]

Asteroseismology of the mild Am $δ$ Sct star HD 118660 : TESS photometry and modelling

Authors: Mrinmoy Sarkar, Santosh Joshi, Marc-Antoine Dupret, Otto Trust, Peter De Cat, Eugene Semenko, Patricia Lampens, Aruna Goswami, David Mkrtichian, Drisya Karinkuzhi, Ilya Yakunin, Archana Gupta

Abstract: We present the results of an asteroseismic study of HD 118660 (TIC 171729860), being a chemically peculiar (mild Am) star exhibiting $δ$ Scuti ($δ$ Sct) pulsations. It is based on the analysis of two sectors of time-series photometry from the space mission TESS and seismic modelling. It yielded the detection of 15 and 16 frequencies for TESS sectors 23 and 50, respectively. The identified pulsatio… ▽ More We present the results of an asteroseismic study of HD 118660 (TIC 171729860), being a chemically peculiar (mild Am) star exhibiting $δ$ Scuti ($δ$ Sct) pulsations. It is based on the analysis of two sectors of time-series photometry from the space mission TESS and seismic modelling. It yielded the detection of 15 and 16 frequencies for TESS sectors 23 and 50, respectively. The identified pulsation modes include four radial ($\ell=0$) and five dipolar ($\ell=1$) ones. The radial modes are overtones with order $n$ ranging from $3$ and $6$. Such high values of $n$ are theoretically not expected for stars with the effective temperature of HD 118660 ($\rm T_{\rm eff}\approx 7550 \rm K$ ) located near the red edge of the $δ$ Sct instability strip. To estimate the asteroseismic parameters, we have generated a grid of stellar models assuming a solar metallicity ($Z=0.014$) and different values for the convective overshooting parameter ($0.1\leq α_{\rm ov}\leq 0.3$). We conclude that the analysis of the radial modes is insufficient to constrain $α_{\rm ov}$ and $Z$ for $δ$ Sct stars. The value for the equatorial velocity of HD 118660 derived from the seismic radius and the rotational frequency is consistent with values found in the literature. △ Less

Submitted 3 October, 2024; originally announced October 2024.

Comments: 10 pages, 7 figures, Accepted for publication in MNRAS

arXiv:2410.02024 [pdf, other]

FLAG: Financial Long Document Classification via AMR-based GNN

Authors: Bolun "Namir" Xia, Mohammed J. Zaki, Aparna Gupta

Abstract: The advent of large language models (LLMs) has initiated much research into their various financial applications. However, in applying LLMs on long documents, semantic relations are not explicitly incorporated, and a full or arbitrarily sparse attention operation is employed. In recent years, progress has been made in Abstract Meaning Representation (AMR), which is a graph-based representation of… ▽ More The advent of large language models (LLMs) has initiated much research into their various financial applications. However, in applying LLMs on long documents, semantic relations are not explicitly incorporated, and a full or arbitrarily sparse attention operation is employed. In recent years, progress has been made in Abstract Meaning Representation (AMR), which is a graph-based representation of text to preserve its semantic relations. Since AMR can represent semantic relationships at a deeper level, it can be beneficially utilized by graph neural networks (GNNs) for constructing effective document-level graph representations built upon LLM embeddings to predict target metrics in the financial domain. We propose FLAG: Financial Long document classification via AMR-based GNN, an AMR graph based framework to generate document-level embeddings for long financial document classification. We construct document-level graphs from sentence-level AMR graphs, endow them with specialized LLM word embeddings in the financial domain, apply a deep learning mechanism that utilizes a GNN, and examine the efficacy of our AMR-based approach in predicting labeled target data from long financial documents. Extensive experiments are conducted on a dataset of quarterly earnings calls transcripts of companies in various sectors of the economy, as well as on a corpus of more recent earnings calls of companies in the S&P 1500 Composite Index. We find that our AMR-based approach outperforms fine-tuning LLMs directly on text in predicting stock price movement trends at different time horizons in both datasets. Our work also outperforms previous work utilizing document graphs and GNNs for text classification. △ Less

Submitted 14 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

Comments: 8 pages, 3 figures, to be published in CIFEr Conference 2024 as "Semantic Graph Learning for Trend Prediction from Long Financial Documents"

arXiv:2410.00023 [pdf, other]

Self-Tuning Spectral Clustering for Speaker Diarization

Authors: Nikhil Raghav, Avisek Gupta, Md Sahidullah, Swagatam Das

Abstract: Spectral clustering has proven effective in grouping speech representations for speaker diarization tasks, although post-processing the affinity matrix remains difficult due to the need for careful tuning before constructing the Laplacian. In this study, we present a novel pruning algorithm to create a sparse affinity matrix called \emph{spectral clustering on p-neighborhood retained affinity matr… ▽ More Spectral clustering has proven effective in grouping speech representations for speaker diarization tasks, although post-processing the affinity matrix remains difficult due to the need for careful tuning before constructing the Laplacian. In this study, we present a novel pruning algorithm to create a sparse affinity matrix called \emph{spectral clustering on p-neighborhood retained affinity matrix} (SC-pNA). Our method improves on node-specific fixed neighbor selection by allowing a variable number of neighbors, eliminating the need for external tuning data as the pruning parameters are derived directly from the affinity matrix. SC-pNA does so by identifying two clusters in every row of the initial affinity matrix, and retains only the top $p\%$ similarity scores from the cluster containing larger similarities. Spectral clustering is performed subsequently, with the number of clusters determined as the maximum eigengap. Experimental results on the challenging DIHARD-III dataset highlight the superiority of SC-pNA, which is also computationally more efficient than existing auto-tuning approaches. △ Less

Submitted 16 September, 2024; originally announced October 2024.

Comments: Submitted to ICASSP 2025

arXiv:2409.20548 [pdf, other]

Robi Butler: Remote Multimodal Interactions with Household Robot Assistant

Authors: Anxing Xiao, Nuwan Janaka, Tianrun Hu, Anshul Gupta, Kaixin Li, Cunjun Yu, David Hsu

Abstract: In this paper, we introduce Robi Butler, a novel household robotic system that enables multimodal interactions with remote users. Building on the advanced communication interfaces, Robi Butler allows users to monitor the robot's status, send text or voice instructions, and select target objects by hand pointing. At the core of our system is a high-level behavior module, powered by Large Language M… ▽ More In this paper, we introduce Robi Butler, a novel household robotic system that enables multimodal interactions with remote users. Building on the advanced communication interfaces, Robi Butler allows users to monitor the robot's status, send text or voice instructions, and select target objects by hand pointing. At the core of our system is a high-level behavior module, powered by Large Language Models (LLMs), that interprets multimodal instructions to generate action plans. These plans are composed of a set of open vocabulary primitives supported by Vision Language Models (VLMs) that handle both text and pointing queries. The integration of the above components allows Robi Butler to ground remote multimodal instructions in the real-world home environment in a zero-shot manner. We demonstrate the effectiveness and efficiency of this system using a variety of daily household tasks that involve remote users giving multimodal instructions. Additionally, we conducted a user study to analyze how multimodal interactions affect efficiency and user experience during remote human-robot interaction and discuss the potential improvements. △ Less

Submitted 30 September, 2024; originally announced September 2024.

arXiv:2409.19736 [pdf, other]

Spectroscopic Visualization of Hard Quasi-1D Superconductivity Induced in Nanowires Deposited on a Quasi-2D Indium film

Authors: Ambikesh Gupta, Pranab Kumar Nag, Shai Kiriati, Samuel D. Escribano, Man Suk Song, Hadas Shtrikman, Yuval Oreg, Nurit Avraham, Haim Beidenkopf

Abstract: Following significant progress in the visualization and characterization of hybrid superconducting-semiconducting systems, greatly propelled by reports of Majorana zero modes in nanowire devices, considerable attention has been devoted to investigating the electronic structure at the buried superconducting-semiconducting interface and the nature of the induced superconducting correlations. The pro… ▽ More Following significant progress in the visualization and characterization of hybrid superconducting-semiconducting systems, greatly propelled by reports of Majorana zero modes in nanowire devices, considerable attention has been devoted to investigating the electronic structure at the buried superconducting-semiconducting interface and the nature of the induced superconducting correlations. The properties of that interface and the structure of the electronic wave functions that occupy it determine the functionality and the topological nature of the induced superconducting state. Here, we introduce a novel hybrid platform for proximity-inducing superconductivity in InAs$_{0.6}$Sb$_{0.4}$ nanowires, leveraging a unique architecture and material combination. By dispersing these nanowires over a superconducting Indium film we exploit Indium's high critical temperature of 3.7~K and the anticipated high spin-orbit and Zeeman couplings of InAs$_{0.6}$Sb$_{0.4}$. This design preserves the pristine top facet of the nanowires, making it highly compatible with scanning tunneling spectroscopy. Using this architecture we demonstrate that the mechanical contact supports Cooper-pair transparency as high as 90\%, comparable with epitaxial interfaces. The anisotropic angular response to an applied magnetic field shows the quasi-two-dimensional nature of the parent superconductivity in the Indium film and the quasi-one-dimensional nature of the induced superconductivity in the nanowires. Our platform offers robust and advantageous foundations for studying the emergence of topological superconductivity and the interplay of superconductivity and magnetism using atomic-scale spectroscopic tools. △ Less

Submitted 29 September, 2024; originally announced September 2024.

Comments: 30 pages, 20 figures

arXiv:2409.19147 [pdf, other]

Training the Next Generation of Seismologists: Delivering Research-Grade Software Education for Cloud and HPC Computing through Diverse Training Modalities

Authors: M. Denolle, C. Tape, E. Bozdağ, Y. Wang, F. Waldhauser, A. A. Gabriel, J. Braunmiller, B. Chow, L. Ding, K. F. Feng, A. Ghosh, N. Groebner, A. Gupta, Z. Krauss, A. McPherson, M. Nagaso, Z. Niu, Y. Ni, R. \" Orsvuran, G. Pavlis, F. Rodriguez-Cardozo, T. Sawi, N. Schliwa, D. Schneller, Q. Shi , et al. (6 additional authors not shown)

Abstract: With the rise of data volume and computing power, seismological research requires more advanced skills in data processing, numerical methods, and parallel computing. We present the experience of conducting training workshops over various forms of delivery to support the adoption of large-scale High-Performance Computing and Cloud computing to advance seismological research. The seismological foci… ▽ More With the rise of data volume and computing power, seismological research requires more advanced skills in data processing, numerical methods, and parallel computing. We present the experience of conducting training workshops over various forms of delivery to support the adoption of large-scale High-Performance Computing and Cloud computing to advance seismological research. The seismological foci were on earthquake source parameter estimation in catalogs, forward and adjoint wavefield simulations in 2 and 3 dimensions at local, regional, and global scales, earthquake dynamics, ambient noise seismology, and machine learning. This contribution describes the series of workshops, the learning outcomes of the participants, and lessons learned by the instructors. Our curriculum was grounded on open and reproducible science, large-scale scientific computing and data mining, and computing infrastructure (access and usage) for HPC and the cloud. We also describe the types of teaching materials that have proven beneficial to the instruction and the sustainability of the program. We propose guidelines to deliver future workshops on these topics. △ Less

Submitted 27 September, 2024; originally announced September 2024.

arXiv:2409.18288 [pdf, other]

The hypothetical track-length fitting algorithm for energy measurement in liquid argon TPCs

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, F. Akbar, N. S. Alex, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, C. Andreopoulos , et al. (1348 additional authors not shown)

Abstract: This paper introduces the hypothetical track-length fitting algorithm, a novel method for measuring the kinetic energies of ionizing particles in liquid argon time projection chambers (LArTPCs). The algorithm finds the most probable offset in track length for a track-like object by comparing the measured ionization density as a function of position with a theoretical prediction of the energy loss… ▽ More This paper introduces the hypothetical track-length fitting algorithm, a novel method for measuring the kinetic energies of ionizing particles in liquid argon time projection chambers (LArTPCs). The algorithm finds the most probable offset in track length for a track-like object by comparing the measured ionization density as a function of position with a theoretical prediction of the energy loss as a function of the energy, including models of electron recombination and detector response. The algorithm can be used to measure the energies of particles that interact before they stop, such as charged pions that are absorbed by argon nuclei. The algorithm's energy measurement resolutions and fractional biases are presented as functions of particle kinetic energy and number of track hits using samples of stopping secondary charged pions in data collected by the ProtoDUNE-SP detector, and also in a detailed simulation. Additional studies describe impact of the dE/dx model on energy measurement performance. The method described in this paper to characterize the energy measurement performance can be repeated in any LArTPC experiment using stopping secondary charged pions. △ Less

Submitted 1 October, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

Report number: FERMILAB-PUB-24-0561-LBNF-PPD, CERN-EP-2024-256

arXiv:2409.17220 [pdf, other]

Spatially correlated stellar accretion in the Lupus star forming region: Evidence for ongoing infall from the interstellar medium

Authors: Andrew J. Winter, Myriam Benisty, Carlo F. Manara, Aashish Gupta

Abstract: Growing evidence suggests that protoplanetary discs may be influenced by late stage infall from the interstellar medium (ISM). It remains unclear the degree to which infall shapes disc populations at ages $\gtrsim 1$~Myr. We explore possible spatial correlations between stellar accretion rates in the Lupus star forming region, which would support the hypothesis that infall can regulate stellar acc… ▽ More Growing evidence suggests that protoplanetary discs may be influenced by late stage infall from the interstellar medium (ISM). It remains unclear the degree to which infall shapes disc populations at ages $\gtrsim 1$~Myr. We explore possible spatial correlations between stellar accretion rates in the Lupus star forming region, which would support the hypothesis that infall can regulate stellar accretion. We consider both the `clustered' stars towards the center of Lupus 3, and the `distributed' stars that are more sparsely distributed across the Lupus complex. We take the observed accretion rates in the literature and explore spatial correlations. In particular, we test whether the clustered stars exhibit a radial gradient in normalised accretion rates, and whether the distributed stars have spatially correlated accretion rates. We find statistically significant correlations for both the clustered and distributed samples. The clustered sample exhibits higher accretion rates in the central region, consistent with the expected Bondi-Hoyle-Lyttleton accretion rate. Stars that are spatially closer among the distributed population also exhibit more similar accretion rates. These results cannot be explained by the stellar mass distribution for either sample. Age gradients are disfavoured, though not discounted, because normalised disc dust masses are not spatially correlated across the region. Spatially correlated stellar accretion rates within the Lupus star forming region argue in favour of an environmental influence on stellar accretion, possibly combined with internal processes in the inner disc. Refined age measurements and searches for evidence of infalling material are potential ways to further test this finding. △ Less

Submitted 7 October, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

Comments: 10 pages, 6 figures, accepted for publication in A&A

arXiv:2409.17141 [pdf, other]

FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression

Authors: Fazal Mittu, Yihuan Bu, Akshat Gupta, Ashok Devireddy, Alp Eren Ozdarendeli, Anant Singh, Gopala Anumanchipalli

Abstract: While the language modeling objective has been shown to be deeply connected with compression, it is surprising that modern LLMs are not employed in practical text compression systems. In this paper, we provide an in-depth analysis of neural network and transformer-based compression techniques to answer this question. We compare traditional text compression systems with neural network and LLM-based… ▽ More While the language modeling objective has been shown to be deeply connected with compression, it is surprising that modern LLMs are not employed in practical text compression systems. In this paper, we provide an in-depth analysis of neural network and transformer-based compression techniques to answer this question. We compare traditional text compression systems with neural network and LLM-based text compression methods. Although LLM-based systems significantly outperform conventional compression methods, they are highly impractical. Specifically, LLMZip, a recent text compression system using Llama3-8B requires 9.5 days to compress just 10 MB of text, although with huge improvements in compression ratios. To overcome this, we present FineZip - a novel LLM-based text compression system that combines ideas of online memorization and dynamic context to reduce the compression time immensely. FineZip can compress the above corpus in approximately 4 hours compared to 9.5 days, a 54 times improvement over LLMZip and comparable performance. FineZip outperforms traditional algorithmic compression methods with a large margin, improving compression ratios by approximately 50\%. With this work, we take the first step towards making lossless text compression with LLMs a reality. While FineZip presents a significant step in that direction, LLMs are still not a viable solution for large-scale text compression. We hope our work paves the way for future research and innovation to solve this problem. △ Less