Search | arXiv e-print repository

doi 10.1109/I2CT61223.2024.10544178

A Data Selection Approach for Enhancing Low Resource Machine Translation Using Cross-Lingual Sentence Representations

Authors: Nidhi Kowtal, Tejas Deshpande, Raviraj Joshi

Abstract: Machine translation in low-resource language pairs faces significant challenges due to the scarcity of parallel corpora and linguistic resources. This study focuses on the case of English-Marathi language pairs, where existing datasets are notably noisy, impeding the performance of machine translation models. To mitigate the impact of data quality issues, we propose a data filtering approach based… ▽ More Machine translation in low-resource language pairs faces significant challenges due to the scarcity of parallel corpora and linguistic resources. This study focuses on the case of English-Marathi language pairs, where existing datasets are notably noisy, impeding the performance of machine translation models. To mitigate the impact of data quality issues, we propose a data filtering approach based on cross-lingual sentence representations. Our methodology leverages a multilingual SBERT model to filter out problematic translations in the training data. Specifically, we employ an IndicSBERT similarity model to assess the semantic equivalence between original and translated sentences, allowing us to retain linguistically correct translations while discarding instances with substantial deviations. The results demonstrate a significant improvement in translation quality over the baseline post-filtering with IndicSBERT. This illustrates how cross-lingual sentence representations can reduce errors in machine translation scenarios with limited resources. By integrating multilingual sentence BERT models into the translation pipeline, this research contributes to advancing machine translation techniques in low-resource environments. The proposed method not only addresses the challenges in English-Marathi language pairs but also provides a valuable framework for enhancing translation quality in other low-resource language translation tasks. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: Accepted at I2CT 2024

arXiv:2409.02392 [pdf, other]

Building Math Agents with Multi-Turn Iterative Preference Learning

Authors: Wei Xiong, Chengshuai Shi, Jiaming Shen, Aviv Rosenberg, Zhen Qin, Daniele Calandriello, Misha Khalman, Rishabh Joshi, Bilal Piot, Mohammad Saleh, Chi Jin, Tong Zhang, Tianqi Liu

Abstract: Recent studies have shown that large language models' (LLMs) mathematical problem-solving capabilities can be enhanced by integrating external tools, such as code interpreters, and employing multi-turn Chain-of-Thought (CoT) reasoning. While current methods focus on synthetic data generation and Supervised Fine-Tuning (SFT), this paper studies the complementary direct preference learning approach… ▽ More Recent studies have shown that large language models' (LLMs) mathematical problem-solving capabilities can be enhanced by integrating external tools, such as code interpreters, and employing multi-turn Chain-of-Thought (CoT) reasoning. While current methods focus on synthetic data generation and Supervised Fine-Tuning (SFT), this paper studies the complementary direct preference learning approach to further improve model performance. However, existing direct preference learning algorithms are originally designed for the single-turn chat task, and do not fully address the complexities of multi-turn reasoning and external tool integration required for tool-integrated mathematical reasoning tasks. To fill in this gap, we introduce a multi-turn direct preference learning framework, tailored for this context, that leverages feedback from code interpreters and optimizes trajectory-level preferences. This framework includes multi-turn DPO and multi-turn KTO as specific implementations. The effectiveness of our framework is validated through training of various language models using an augmented prompt set from the GSM8K and MATH datasets. Our results demonstrate substantial improvements: a supervised fine-tuned Gemma-1.1-it-7B model's performance increased from 77.5% to 83.9% on GSM8K and from 46.1% to 51.2% on MATH. Similarly, a Gemma-2-it-9B model improved from 84.1% to 86.3% on GSM8K and from 51.0% to 54.5% on MATH. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: A multi-turn direct preference learning framework for tool-integrated reasoning tasks

arXiv:2408.17254 [pdf, other]

High-resolution observations of recurrent jets from an arch filament system

Authors: Reetika Joshi, Luc Rouppe van der Voort, Brigitte Schmieder, Fernando Moreno-Insertis, Avijeet Prasad, Guillaume Aulanier, Daniel Nóbrega-Siverio

Abstract: Solar jets are collimated plasma ejections along magnetic field lines observed in hot (EUV jets) and cool (chromospheric surges) temperature diagnostics. Their trigger mechanisms and the relationship between hot and cool jets are still not completely understood. We aim to investigate the generation of a sequence of active region solar jets and their evolution from the photospheric to the coronal h… ▽ More Solar jets are collimated plasma ejections along magnetic field lines observed in hot (EUV jets) and cool (chromospheric surges) temperature diagnostics. Their trigger mechanisms and the relationship between hot and cool jets are still not completely understood. We aim to investigate the generation of a sequence of active region solar jets and their evolution from the photospheric to the coronal heights. Using the synergy of high spatial and temporal resolution observations by the SST, along with the SDO, we analyze a sequence of solar jets originating in a mixed polarity region between the leading and following sunspots of an active region. We use a NFFF extrapolation technique for deriving the magnetic field topology of the active region. A mixed polarity region is formed over a long period (24 hours) with persistent magnetic flux emergence. This region has been observed as an arch filament system (AFS) in chromospheric SST observations. In this region, negative polarities surrounded by positive polarities create a fan-surface with a null point at a height of 6 Mm detected in the NFFF extrapolation. SST observations in H-beta spectral line reveal a large flux rope over the AFS and moving from the North to South, causing successive EUV and cool jets to move in the East-West direction and later towards the South along the long open loops. The high resolution SST observations (0.038 arcsec per pixel) resolve the dark area observed at the jet base and reveal the existence of an AFS with an extended cool jet which may be the result of a peeling-like mechanism of the AFS. Based on the combined analysis of SST and AIA observations along with extrapolated magnetic topology, it is suggested that the magnetic reconnection site may move southward by approximately 20 Mm until it reaches a region where the open magnetic field lines are oriented North-South. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 14 pages, 11 figures, accepted for publication in A&A

arXiv:2408.13558 [pdf, ps, other]

Combinatorial invariants for certain classes of non-abelian groups

Authors: Naveen K. Godara, Renu Joshi, Eshita Mazumdar

Abstract: This article focuses on the study of zero-sum invariants of finite non-abelian groups. We address two main problems: the first centers on the ordered Davenport constant and the second on Gao's constant. We establish a connection between the ordered Davenport constant and the small Davenport constant for a finite non-abelian group of even order, which in turn gives a relation with the Noether numbe… ▽ More This article focuses on the study of zero-sum invariants of finite non-abelian groups. We address two main problems: the first centers on the ordered Davenport constant and the second on Gao's constant. We establish a connection between the ordered Davenport constant and the small Davenport constant for a finite non-abelian group of even order, which in turn gives a relation with the Noether number. Additionally, we confirm a conjecture of Gao and Li for a non-abelian group of order $2p^α$, where $p$ is a prime. Furthermore, we prove a conjecture that connects the ordered Davenport constant to the Loewy length for certain classes of finite $2$-groups. △ Less

Submitted 24 August, 2024; originally announced August 2024.

Comments: 15 pages

MSC Class: 11B75; 11P70

arXiv:2408.11796 [pdf, other]

LLM Pruning and Distillation in Practice: The Minitron Approach

Authors: Sharath Turuvekere Sreenivas, Saurav Muralidharan, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov

Abstract: We present a comprehensive report on compressing the Llama 3.1 8B and Mistral NeMo 12B models to 4B and 8B parameters, respectively, using pruning and distillation. We explore two distinct pruning strategies: (1) depth pruning and (2) joint hidden/attention/MLP (width) pruning, and evaluate the results on common benchmarks from the LM Evaluation Harness. The models are then aligned with NeMo Align… ▽ More We present a comprehensive report on compressing the Llama 3.1 8B and Mistral NeMo 12B models to 4B and 8B parameters, respectively, using pruning and distillation. We explore two distinct pruning strategies: (1) depth pruning and (2) joint hidden/attention/MLP (width) pruning, and evaluate the results on common benchmarks from the LM Evaluation Harness. The models are then aligned with NeMo Aligner and tested in instruct-tuned versions. This approach produces a compelling 4B model from Llama 3.1 8B and a state-of-the-art Mistral-NeMo-Minitron-8B (MN-Minitron-8B for brevity) model from Mistral NeMo 12B. We found that with no access to the original data, it is beneficial to slightly fine-tune teacher models on the distillation dataset. We open-source our base model weights on Hugging Face with a permissive license. △ Less

Submitted 26 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

Comments: v2: Added missing references. Cleaned up runtime performance section

arXiv:2408.03172 [pdf, other]

doi 10.1109/I2CT61223.2024.10543946

Leveraging Parameter Efficient Training Methods for Low Resource Text Classification: A Case Study in Marathi

Authors: Pranita Deshmukh, Nikita Kulkarni, Sanhita Kulkarni, Kareena Manghani, Raviraj Joshi

Abstract: With the surge in digital content in low-resource languages, there is an escalating demand for advanced Natural Language Processing (NLP) techniques tailored to these languages. BERT (Bidirectional Encoder Representations from Transformers), serving as the foundational framework for numerous NLP architectures and language models, is increasingly employed for the development of low-resource NLP mod… ▽ More With the surge in digital content in low-resource languages, there is an escalating demand for advanced Natural Language Processing (NLP) techniques tailored to these languages. BERT (Bidirectional Encoder Representations from Transformers), serving as the foundational framework for numerous NLP architectures and language models, is increasingly employed for the development of low-resource NLP models. Parameter Efficient Fine-Tuning (PEFT) is a method for fine-tuning Large Language Models (LLMs) and reducing the training parameters to some extent to decrease the computational costs needed for training the model and achieve results comparable to a fully fine-tuned model. In this work, we present a study of PEFT methods for the Indic low-resource language Marathi. We conduct a comprehensive analysis of PEFT methods applied to various monolingual and multilingual Marathi BERT models. These approaches are evaluated on prominent text classification datasets like MahaSent, MahaHate, and MahaNews. The incorporation of PEFT techniques is demonstrated to significantly expedite the training speed of the models, addressing a critical aspect of model development and deployment. In this study, we explore Low-Rank Adaptation of Large Language Models (LoRA) and adapter methods for low-resource text classification. We show that these methods are competitive with full fine-tuning and can be used without loss in accuracy. This study contributes valuable insights into the effectiveness of Marathi BERT models, offering a foundation for the continued advancement of NLP capabilities in Marathi and similar Indic languages. △ Less

Submitted 6 August, 2024; originally announced August 2024.

Comments: Accepted at I2CT 2024

arXiv:2407.15816 [pdf]

Efficient and generalizable prediction of molecular alterations in multiple cancer cohorts using H&E whole slide images

Authors: Kshitij Ingale, Sun Hae Hong, Qiyuan Hu, Renyu Zhang, Bo Osinski, Mina Khoshdeli, Josh Och, Kunal Nagpal, Martin C. Stumpe, Rohan P. Joshi

Abstract: Molecular testing of tumor samples for targetable biomarkers is restricted by a lack of standardization, turnaround-time, cost, and tissue availability across cancer types. Additionally, targetable alterations of low prevalence may not be tested in routine workflows. Algorithms that predict DNA alterations from routinely generated hematoxylin and eosin (H&E)-stained images could prioritize samples… ▽ More Molecular testing of tumor samples for targetable biomarkers is restricted by a lack of standardization, turnaround-time, cost, and tissue availability across cancer types. Additionally, targetable alterations of low prevalence may not be tested in routine workflows. Algorithms that predict DNA alterations from routinely generated hematoxylin and eosin (H&E)-stained images could prioritize samples for confirmatory molecular testing. Costs and the necessity of a large number of samples containing mutations limit approaches that train individual algorithms for each alteration. In this work, models were trained for simultaneous prediction of multiple DNA alterations from H&E images using a multi-task approach. Compared to biomarker-specific models, this approach performed better on average, with pronounced gains for rare mutations. The models reasonably generalized to independent temporal-holdout, externally-stained, and multi-site TCGA test sets. Additionally, whole slide image embeddings derived using multi-task models demonstrated strong performance in downstream tasks that were not a part of training. Overall, this is a promising approach to develop clinically useful algorithms that provide multiple actionable predictions from a single slide. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.14679 [pdf, other]

Compact Language Models via Pruning and Knowledge Distillation

Authors: Saurav Muralidharan, Sharath Turuvekere Sreenivas, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov

Abstract: Large language models (LLMs) targeting different deployment scales and sizes are currently produced by training each variant from scratch; this is extremely compute-intensive. In this paper, we investigate if pruning an existing LLM and then re-training it with a fraction (<3%) of the original training data can be a suitable alternative to repeated, full retraining. To this end, we develop a set o… ▽ More Large language models (LLMs) targeting different deployment scales and sizes are currently produced by training each variant from scratch; this is extremely compute-intensive. In this paper, we investigate if pruning an existing LLM and then re-training it with a fraction (<3%) of the original training data can be a suitable alternative to repeated, full retraining. To this end, we develop a set of practical and effective compression best practices for LLMs that combine depth, width, attention and MLP pruning with knowledge distillation-based retraining; we arrive at these best practices through a detailed empirical exploration of pruning strategies for each axis, methods to combine axes, distillation strategies, and search techniques for arriving at optimal compressed architectures. We use this guide to compress the Nemotron-4 family of LLMs by a factor of 2-4x, and compare their performance to similarly-sized models on a variety of language modeling tasks. Deriving 8B and 4B models from an already pretrained 15B model using our approach requires up to 40x fewer training tokens per model compared to training from scratch; this results in compute cost savings of 1.8x for training the full model family (15B, 8B, and 4B). Minitron models exhibit up to a 16% improvement in MMLU scores compared to training from scratch, perform comparably to other community models such as Mistral 7B, Gemma 7B and Llama-3 8B, and outperform state-of-the-art compression techniques from the literature. We have open-sourced Minitron model weights on Huggingface, with corresponding supplementary material including example code available on GitHub. △ Less

Submitted 19 July, 2024; originally announced July 2024.

arXiv:2407.12869 [pdf, ps, other]

Bilingual Adaptation of Monolingual Foundation Models

Authors: Gurpreet Gosal, Yishi Xu, Gokul Ramakrishnan, Rituraj Joshi, Avraham Sheinin, Zhiming, Chen, Biswajit Mishra, Natalia Vassilieva, Joel Hestness, Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Onkar Pandit, Satheesh Katipomu, Samta Kamboj, Samujjwal Ghosh, Rahul Pal, Parvez Mullah, Soundar Doraiswamy, Mohamed El Karim Chami, Preslav Nakov

Abstract: We present an efficient method for adapting a monolingual Large Language Model (LLM) to another language, addressing challenges of catastrophic forgetting and tokenizer limitations. We focus this study on adapting Llama 2 to Arabic. Our two-stage approach begins with expanding the vocabulary and training only the embeddings matrix, followed by full model continual pre-training on a bilingual corpu… ▽ More We present an efficient method for adapting a monolingual Large Language Model (LLM) to another language, addressing challenges of catastrophic forgetting and tokenizer limitations. We focus this study on adapting Llama 2 to Arabic. Our two-stage approach begins with expanding the vocabulary and training only the embeddings matrix, followed by full model continual pre-training on a bilingual corpus. By continually pre-training on a mix of Arabic and English corpora, the model retains its proficiency in English while acquiring capabilities in Arabic. Our approach results in significant improvements in Arabic and slight enhancements in English, demonstrating cost-effective cross-lingual transfer. We perform ablations on embedding initialization techniques, data mix ratios, and learning rates and release a detailed training recipe. To demonstrate generalizability of this approach we also adapted Llama 3 8B to Arabic and Llama 2 13B to Hindi. △ Less

Submitted 25 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.10940 [pdf, other]

Quantum Control of an Oscillator with a Kerr-cat Qubit

Authors: Andy Z. Ding, Benjamin L. Brock, Alec Eickbusch, Akshay Koottandavida, Nicholas E. Frattini, Rodrigo G. Cortinas, Vidul R. Joshi, Stijn J. de Graaf, Benjamin J. Chapman, Suhas Ganjam, Luigi Frunzio, Robert J. Schoelkopf, Michel H. Devoret

Abstract: Bosonic codes offer a hardware-efficient strategy for quantum error correction by redundantly encoding quantum information in the large Hilbert space of a harmonic oscillator. However, experimental realizations of these codes are often limited by ancilla errors propagating to the encoded logical qubit during syndrome measurements. The Kerr-cat qubit has been proposed as an ancilla for these codes… ▽ More Bosonic codes offer a hardware-efficient strategy for quantum error correction by redundantly encoding quantum information in the large Hilbert space of a harmonic oscillator. However, experimental realizations of these codes are often limited by ancilla errors propagating to the encoded logical qubit during syndrome measurements. The Kerr-cat qubit has been proposed as an ancilla for these codes due to its theoretically-exponential noise bias, which would enable fault-tolerant error syndrome measurements, but the coupling required to perform these syndrome measurements has not yet been demonstrated. In this work, we experimentally realize driven parametric coupling of a Kerr-cat qubit to a high-quality-factor microwave cavity and demonstrate a gate set enabling universal quantum control of the cavity. We measure the decoherence of the cavity in the presence of the Kerr-cat and discover excess dephasing due to heating of the Kerr-cat to excited states. By engineering frequency-selective dissipation to counteract this heating, we are able to eliminate this dephasing, thereby demonstrating a high on-off ratio of control. Our results pave the way toward using the Kerr-cat to fault-tolerantly measure error syndromes of bosonic codes. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.05493 [pdf, other]

Conventional s-wave superconductivity and hidden peak effect in single crystals of Mo$_8$Ga$_41$ superconductor

Authors: Sunil Ghimire, Kyuil Cho, Kamal R. Joshi, Makariy A. Tanatar, Zhixiang Hu, Cedomir Petrovic, Ruslan Prozorov

Abstract: London and Campbell penetration depths were measured in single crystals of the endohedral gallide cluster superconductor, Mo$_{8}$Ga$_{41}$. The full temperature range superfluid density is consistent with the clean isotropic $s-$wave weak-coupling BCS theory without any signs of the second gap or strong coupling. The temperature dependence of the Campbell length is hysteretic between zero-field c… ▽ More London and Campbell penetration depths were measured in single crystals of the endohedral gallide cluster superconductor, Mo$_{8}$Ga$_{41}$. The full temperature range superfluid density is consistent with the clean isotropic $s-$wave weak-coupling BCS theory without any signs of the second gap or strong coupling. The temperature dependence of the Campbell length is hysteretic between zero-field cooling (ZFC) and field-cooling (FC) protocols, indicating an anharmonic vortex pinning potential. The field dependence of the effective critical current density, $j_{c}\left(H\right)$, reveals an unusual result. While in the ZFC protocol, $j_{c}\left(H\right)$ is monotonically suppressed by the magnetic field, it exhibits a profound ``hidden'' peak effect in the FC protocol, that is, without a vortex density gradient. We suggest a possible novel mechanism for the formation of the peak effect, which involves both static and dynamic aspects. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.05264 [pdf, ps, other]

$θ$-free matching covered graphs

Authors: Rohinee Joshi, Nishad Kothari

Abstract: A nontrivial connected graph is matching covered if each edge belongs to some perfect matching. For most problems pertaining to perfect matchings, one may restrict attention to matching covered graphs; thus, there is extensive literature on them. A cornerstone of this theory is an ear decomposition result due to Lovász and Plummer. Their theorem is a fundamental problem-solving tool, and also yiel… ▽ More A nontrivial connected graph is matching covered if each edge belongs to some perfect matching. For most problems pertaining to perfect matchings, one may restrict attention to matching covered graphs; thus, there is extensive literature on them. A cornerstone of this theory is an ear decomposition result due to Lovász and Plummer. Their theorem is a fundamental problem-solving tool, and also yields interesting open problems; we discuss two such problems below, and we solve one of them. A subgraph $H$ of a graph $G$ is conformal if $G-V(H)$ has a perfect matching. This notion is intrinsically related to the aforementioned ear decomposition theorem -- which implies that each matching covered graph (apart from $K_2$ and even cycles) contains a conformal bisubdivision of $θ$, or a conformal bisubdivision of $K_4$, possibly both. (Here, $θ$ refers to the graph with two vertices joined by three edges.) This immediately leads to two problems: characterize $θ$-free (likewise, $K_4$-free) matching covered graphs. A characterization of planar $K_4$-free matching covered graphs was obtained by Kothari and Murty [J. Graph Theory, 82 (1), 2016]; the nonplanar case is open. We provide a characterization of $θ$-free matching covered graphs that immediately implies a poly-time algorithm for the corresponding decision problem. Our characterization relies heavily on a seminal result due to Edmonds, Lovász and Pulleyblank [Combinatorica, 2, 1982] pertaining to the tight cut decomposition theory of matching covered graphs. As corollaries, we provide two upper bounds on the size of a $θ$-free graph, namely, $m\leq 2n-1$ and $m\leq \frac{3n}{2}+b-1$, where $b$ denotes the number of bricks obtained in any tight cut decomposition of the graph; for each bound, we provide a characterization of the tight examples. The Petersen graph and $K_4$ play key roles in our results. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Submitted to a journal

arXiv:2407.03587 [pdf, other]

Single-gap Isotropic $s-$wave Superconductivity in Single Crystals $\text{AuSn}_4$

Authors: Sunil Ghimire, Kamal R. Joshi, Elizabeth H. Krenkel, Makariy A. Tanatar, Marcin Konczykowski, Romain Grasset, Paul C. Canfield, Ruslan Prozorov

Abstract: London, $λ_L (T)$, and Campbell, $λ_{C} (T)$, penetration depths were measured in single crystals of a topological superconductor candidate $\text{AuSn}_4$. At low temperatures, $λ_L (T)$ is exponentially attenuated and, if fitted with the power law, $λ(T) \sim T^n$, gives exponents $n>4$, indistinguishable from the isotropic single $s-$wave gap Bardeen-Cooper-Schrieffer (BCS) asymptotic. The supe… ▽ More London, $λ_L (T)$, and Campbell, $λ_{C} (T)$, penetration depths were measured in single crystals of a topological superconductor candidate $\text{AuSn}_4$. At low temperatures, $λ_L (T)$ is exponentially attenuated and, if fitted with the power law, $λ(T) \sim T^n$, gives exponents $n>4$, indistinguishable from the isotropic single $s-$wave gap Bardeen-Cooper-Schrieffer (BCS) asymptotic. The superfluid density fits perfectly in the entire temperature range to the BCS theory. The superconducting transition temperature, $T_c = 2.40 \pm 0.05\:\text{K}$, does not change after 2.5 MeV electron irradiation, indicating the validity of the Anderson theorem for isotropic $s-$wave superconductors. Campbell penetration depth before and after electron irradiation shows no hysteresis between the zero-field cooling (ZFC) and field cooling (FC) protocols, consistent with the parabolic pinning potential. Interestingly, the critical current density estimated from the original Campbell theory decreases after irradiation, implying that a more sophisticated theory involving collective effects is needed to describe vortex pinning in this system. In general, our thermodynamic measurements strongly suggest that the bulk response of the $\text{AuSn}_4$ crystals is fully consistent with the isotropic $s-$wave weak-coupling BCS superconductivity. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.01148 [pdf, ps, other]

On a conjecture related to the Davenport constant

Authors: Naveen K. Godara, Renu Joshi, Eshita Mazumdar

Abstract: For a finite group $G,$ $D(G)$ is defined as the least positive integer $k$ such that for every sequence $S=g_1 g_2\cdots g_k$ of length $k$ over $G$, there exist $1 \le i_1 < i_2 <\cdots < i_m \le k $ such that $\prod_{j=1}^{m} g_{i_{σ(j)}}=1$ holds for $σ= id,$ identity element of $S_m.$ For a finite abelian group, this group invariant, known as the Davenport constant, is crucial in the theory o… ▽ More For a finite group $G,$ $D(G)$ is defined as the least positive integer $k$ such that for every sequence $S=g_1 g_2\cdots g_k$ of length $k$ over $G$, there exist $1 \le i_1 < i_2 <\cdots < i_m \le k $ such that $\prod_{j=1}^{m} g_{i_{σ(j)}}=1$ holds for $σ= id,$ identity element of $S_m.$ For a finite abelian group, this group invariant, known as the Davenport constant, is crucial in the theory of non-unique factorization domains. The precise value of this invariant, even for a finite abelian group of rank greater than $2$, is not known yet. In 1977, Olson and White first worked with this invariant for finite non-abelian groups. After that in 2004, Dimitrov dealt with it, where he proved that $D(G)\leq L(G)$ for a finite $p$-group $G$, where $p$ is a prime and $L(G)$ is the Loewy length of $\mathbb{F}_pG.$ He conjectured that equality holds for all finite $p$-groups. In this article, we compute $D(G)$ for a certain subclass of $2$-generated finite $p$-groups of nilpotency class two and show that the conjecture is true by determining the precise value of the Loewy length of $\mathbb{F}_pG.$ We also evaluate $D(G)$ for finite dicyclic, semi-dihedral and some other groups. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.11029 [pdf, other]

doi 10.1109/I2CT61223.2024.10544359

Curating Stopwords in Marathi: A TF-IDF Approach for Improved Text Analysis and Information Retrieval

Authors: Rohan Chavan, Gaurav Patil, Vishal Madle, Raviraj Joshi

Abstract: Stopwords are commonly used words in a language that are often considered to be of little value in determining the meaning or significance of a document. These words occur frequently in most texts and don't provide much useful information for tasks like sentiment analysis and text classification. English, which is a high-resource language, takes advantage of the availability of stopwords, whereas… ▽ More Stopwords are commonly used words in a language that are often considered to be of little value in determining the meaning or significance of a document. These words occur frequently in most texts and don't provide much useful information for tasks like sentiment analysis and text classification. English, which is a high-resource language, takes advantage of the availability of stopwords, whereas low-resource Indian languages like Marathi are very limited, standardized, and can be used in available packages, but the number of available words in those packages is low. Our work targets the curation of stopwords in the Marathi language using the MahaCorpus, with 24.8 million sentences. We make use of the TF-IDF approach coupled with human evaluation to curate a strong stopword list of 400 words. We apply the stop word removal to the text classification task and show its efficacy. The work also presents a simple recipe for stopword curation in a low-resource language. The stopwords are integrated into the mahaNLP library and publicly available on https://github.com/l3cube-pune/MarathiNLP . △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: Accepted at I2CT 2024

arXiv:2406.11028 [pdf, other]

doi 10.1109/I2CT61223.2024.10543381

Universal Cross-Lingual Text Classification

Authors: Riya Savant, Anushka Shelke, Sakshi Todmal, Sanskruti Kanphade, Ananya Joshi, Raviraj Joshi

Abstract: Text classification, an integral task in natural language processing, involves the automatic categorization of text into predefined classes. Creating supervised labeled datasets for low-resource languages poses a considerable challenge. Unlocking the language potential of low-resource languages requires robust datasets with supervised labels. However, such datasets are scarce, and the label space… ▽ More Text classification, an integral task in natural language processing, involves the automatic categorization of text into predefined classes. Creating supervised labeled datasets for low-resource languages poses a considerable challenge. Unlocking the language potential of low-resource languages requires robust datasets with supervised labels. However, such datasets are scarce, and the label space is often limited. In our pursuit to address this gap, we aim to optimize existing labels/datasets in different languages. This research proposes a novel perspective on Universal Cross-Lingual Text Classification, leveraging a unified model across languages. Our approach involves blending supervised data from different languages during training to create a universal model. The supervised data for a target classification task might come from different languages covering different labels. The primary goal is to enhance label and language coverage, aiming for a label set that represents a union of labels from various languages. We propose the usage of a strong multilingual SBERT as our base model, making our novel training strategy feasible. This strategy contributes to the adaptability and effectiveness of the model in cross-lingual language transfer scenarios, where it can categorize text in languages not encountered during training. Thus, the paper delves into the intricacies of cross-lingual text classification, with a particular focus on its application for low-resource languages, exploring methodologies and implications for the development of a robust and adaptable universal cross-lingual model. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: Accepted at I2CT 2024

arXiv:2406.03989 [pdf, other]

Numerical Simulation of Radiatively driven Transonic Relativistic Jets

Authors: Raj Kishor Joshi, Indranil Chattopadhyay, Antonios Tsokaros, Priyesh Kumar Tripathi

Abstract: We perform the numerical simulations of axisymmetric, relativistic, optically thin jets under the influence of the radiation field of an accretion disk. We show that starting from a very low injection velocity at the base, jets can be accelerated to relativistic terminal speeds when traveling through the radiation field. The jet gains momentum through the interaction with the radiation field. We u… ▽ More We perform the numerical simulations of axisymmetric, relativistic, optically thin jets under the influence of the radiation field of an accretion disk. We show that starting from a very low injection velocity at the base, jets can be accelerated to relativistic terminal speeds when traveling through the radiation field. The jet gains momentum through the interaction with the radiation field. We use a relativistic equation of state for multi-species plasma, which self-consistently calculates the adiabatic index for the jet material. All the jet solutions obtained are transonic in nature. In addition to the acceleration of the jet to relativistic speeds, our results show that the radiation field also acts as a collimating agent. The jets remain well collimated under the effect of radiation pressure. We also show that if the jet starts with a rotational velocity, the radiation field will reduce the angular momentum of the jet beam. △ Less

Submitted 30 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: 19 pages, 11 figures, accepted for publication in ApJ

arXiv:2405.19107 [pdf, ps, other]

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Authors: Pierre Harvey Richemond, Yunhao Tang, Daniel Guo, Daniele Calandriello, Mohammad Gheshlaghi Azar, Rafael Rafailov, Bernardo Avila Pires, Eugene Tarassov, Lucas Spangher, Will Ellsworth, Aliaksei Severyn, Jonathan Mallinson, Lior Shani, Gil Shamir, Rishabh Joshi, Tianqi Liu, Remi Munos, Bilal Piot

Abstract: The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference optimisation, is to learn from preference data. This involves building datasets where each element is a quadruplet composed of a prompt, two independent responses (completions of the prompt) and a human preference between the two independent responses… ▽ More The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference optimisation, is to learn from preference data. This involves building datasets where each element is a quadruplet composed of a prompt, two independent responses (completions of the prompt) and a human preference between the two independent responses, yielding a preferred and a dis-preferred response. Such data is typically scarce and expensive to collect. On the other hand, \emph{single-trajectory} datasets where each element is a triplet composed of a prompt, a response and a human feedback is naturally more abundant. The canonical element of such datasets is for instance an LLM's response to a user's prompt followed by a user's feedback such as a thumbs-up/down. Consequently, in this work, we propose DRO, or \emph{Direct Reward Optimisation}, as a framework and associated algorithms that do not require pairwise preferences. DRO uses a simple mean-squared objective that can be implemented in various ways. We validate our findings empirically, using T5 encoder-decoder language models, and show DRO's performance over selected baselines such as Kahneman-Tversky Optimization (KTO). Thus, we confirm that DRO is a simple and empirically compelling method for single-trajectory policy optimisation. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.07933 [pdf, other]

Authentic Hand Avatar from a Phone Scan via Universal Hand Model

Authors: Gyeongsik Moon, Weipeng Xu, Rohan Joshi, Chenglei Wu, Takaaki Shiratori

Abstract: The authentic 3D hand avatar with every identifiable information, such as hand shapes and textures, is necessary for immersive experiences in AR/VR. In this paper, we present a universal hand model (UHM), which 1) can universally represent high-fidelity 3D hand meshes of arbitrary identities (IDs) and 2) can be adapted to each person with a short phone scan for the authentic hand avatar. For effec… ▽ More The authentic 3D hand avatar with every identifiable information, such as hand shapes and textures, is necessary for immersive experiences in AR/VR. In this paper, we present a universal hand model (UHM), which 1) can universally represent high-fidelity 3D hand meshes of arbitrary identities (IDs) and 2) can be adapted to each person with a short phone scan for the authentic hand avatar. For effective universal hand modeling, we perform tracking and modeling at the same time, while previous 3D hand models perform them separately. The conventional separate pipeline suffers from the accumulated errors from the tracking stage, which cannot be recovered in the modeling stage. On the other hand, ours does not suffer from the accumulated errors while having a much more concise overall pipeline. We additionally introduce a novel image matching loss function to address a skin sliding during the tracking and modeling, while existing works have not focused on it much. Finally, using learned priors from our UHM, we effectively adapt our UHM to each person's short phone scan for the authentic hand avatar. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: Accepted to CVPR 2024

arXiv:2404.18228 [pdf, other]

doi 10.1007/978-3-031-58495-4_12

TextGram: Towards a better domain-adaptive pretraining

Authors: Sharayu Hiwarkhedkar, Saloni Mittal, Vidula Magdum, Omkar Dhekane, Raviraj Joshi, Geetanjali Kale, Arnav Ladkat

Abstract: For green AI, it is crucial to measure and reduce the carbon footprint emitted during the training of large language models. In NLP, performing pre-training on Transformer models requires significant computational resources. This pre-training involves using a large amount of text data to gain prior knowledge for performing downstream tasks. Thus, it is important that we select the correct data in… ▽ More For green AI, it is crucial to measure and reduce the carbon footprint emitted during the training of large language models. In NLP, performing pre-training on Transformer models requires significant computational resources. This pre-training involves using a large amount of text data to gain prior knowledge for performing downstream tasks. Thus, it is important that we select the correct data in the form of domain-specific data from this vast corpus to achieve optimum results aligned with our domain-specific tasks. While training on large unsupervised data is expensive, it can be optimized by performing a data selection step before pretraining. Selecting important data reduces the space overhead and the substantial amount of time required to pre-train the model while maintaining constant accuracy. We investigate the existing selection strategies and propose our own domain-adaptive data selection method - TextGram - that effectively selects essential data from large corpora. We compare and evaluate the results of finetuned models for text classification task with and without data selection. We show that the proposed strategy works better compared to other selection methods. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: Accepted at SPELLL 2023

arXiv:2404.18216 [pdf, other]

doi 10.1007/978-3-031-58495-4_4

L3Cube-MahaNews: News-based Short Text and Long Document Classification Datasets in Marathi

Authors: Saloni Mittal, Vidula Magdum, Omkar Dhekane, Sharayu Hiwarkhedkar, Raviraj Joshi

Abstract: The availability of text or topic classification datasets in the low-resource Marathi language is limited, typically consisting of fewer than 4 target labels, with some achieving nearly perfect accuracy. In this work, we introduce L3Cube-MahaNews, a Marathi text classification corpus that focuses on News headlines and articles. This corpus stands out as the largest supervised Marathi Corpus, conta… ▽ More The availability of text or topic classification datasets in the low-resource Marathi language is limited, typically consisting of fewer than 4 target labels, with some achieving nearly perfect accuracy. In this work, we introduce L3Cube-MahaNews, a Marathi text classification corpus that focuses on News headlines and articles. This corpus stands out as the largest supervised Marathi Corpus, containing over 1.05L records classified into a diverse range of 12 categories. To accommodate different document lengths, MahaNews comprises three supervised datasets specifically designed for short text, long documents, and medium paragraphs. The consistent labeling across these datasets facilitates document length-based analysis. We provide detailed data statistics and baseline results on these datasets using state-of-the-art pre-trained BERT models. We conduct a comparative analysis between monolingual and multilingual BERT models, including MahaBERT, IndicBERT, and MuRIL. The monolingual MahaBERT model outperforms all others on every dataset. These resources also serve as Marathi topic classification datasets or models and are publicly available at https://github.com/l3cube-pune/MarathiNLP . △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: Accepted at SPELLL 2023

arXiv:2404.13364 [pdf, other]

MahaSQuAD: Bridging Linguistic Divides in Marathi Question-Answering

Authors: Ruturaj Ghatage, Aditya Kulkarni, Rajlaxmi Patil, Sharvi Endait, Raviraj Joshi

Abstract: Question-answering systems have revolutionized information retrieval, but linguistic and cultural boundaries limit their widespread accessibility. This research endeavors to bridge the gap of the absence of efficient QnA datasets in low-resource languages by translating the English Question Answering Dataset (SQuAD) using a robust data curation approach. We introduce MahaSQuAD, the first-ever full… ▽ More Question-answering systems have revolutionized information retrieval, but linguistic and cultural boundaries limit their widespread accessibility. This research endeavors to bridge the gap of the absence of efficient QnA datasets in low-resource languages by translating the English Question Answering Dataset (SQuAD) using a robust data curation approach. We introduce MahaSQuAD, the first-ever full SQuAD dataset for the Indic language Marathi, consisting of 118,516 training, 11,873 validation, and 11,803 test samples. We also present a gold test set of manually verified 500 examples. Challenges in maintaining context and handling linguistic nuances are addressed, ensuring accurate translations. Moreover, as a QnA dataset cannot be simply converted into any low-resource language using translation, we need a robust method to map the answer translation to its span in the translated passage. Hence, to address this challenge, we also present a generic approach for translating SQuAD into any low-resource language. Thus, we offer a scalable approach to bridge linguistic and cultural gaps present in low-resource languages, in the realm of question-answering systems. The datasets and models are shared publicly at https://github.com/l3cube-pune/MarathiNLP . △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: Accepted at the International Conference on Natural Language Processing (ICON 2023)

arXiv:2404.13171 [pdf, other]

doi 10.1051/0004-6361/202449553

Generic low-atmosphere signatures of swirled-anemone jets

Authors: Reetika Joshi, Guillaume Aulanier, Alice Radcliffe, Luc Rouppe van der Voort, Etienne Pariat, Daniel Nóbrega-Siverio, Brigitte Schmieder

Abstract: Solar jets are collimated plasma flows moving along magnetic field lines and accelerated at low altitude following magnetic reconnection. Several of them originate from anemone-shaped low-lying arcades and the most impulsive ones tend to be relatively wider and display untwisting motions. We aim to establish typical behaviours and observational signatures in the low atmosphere that can occur in re… ▽ More Solar jets are collimated plasma flows moving along magnetic field lines and accelerated at low altitude following magnetic reconnection. Several of them originate from anemone-shaped low-lying arcades and the most impulsive ones tend to be relatively wider and display untwisting motions. We aim to establish typical behaviours and observational signatures in the low atmosphere that can occur in response to the coronal development of such impulsive jets. We analysed an observed solar jet associated with a circular flare ribbon, using high-resolution observations from SST coordinated with IRIS and SDO. We related specifically identified features with those developing in a generic 3D line-tied numerical simulation of reconnection driven jets, performed with the ARMS code. We identified three features in the SST observations: the formation of a hook along the circular ribbon, the gradual widening of the jet through the apparent displacement of its kinked edge towards, and not away from the presumed reconnection site, and the falling back of some of the jet plasma towards a footpoint offset from that of the jet itself. The 3D numerical simulation naturally accounts for these features which were not imposed a priori. Our analyses allow to interpret them in the context of the 3D geometry of the asymmetric swirled anemone loops and their sequences of reconnection with ambient coronal loops. Given the relatively-simple conditions in which the observed jet occurred, together with the generic nature of the simulation that comprised minimum assumptions, we predict that the specific features that we identified and interpreted are probably typical of every impulsive jet △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 13 pages, 8 figures, Accepted for publication in Astronomy and Astrophysics

Journal ref: A&A 687, A172 (2024)

arXiv:2403.16796 [pdf]

Development and Assessment of a Miniaturized Thermocouple for Precise Temperature Measurement in Biological Tissues and Cells

Authors: Onnop Srivannavit, Rakesh Joshi, Weibin Zhu, Bin Gong, Stuart C. Sealfon, Theodorian Borca-Tasciuc, Angelo Gaitas

Abstract: This study presents a novel thermocouple instrument designed for precise temperature monitoring within biological tissues and cells, addressing a significant gap in biological research. Constructed on a Silicon-On-Insulator (SOI) substrate, the instrument employs doped silicon and chromium/gold junctions, achieving a Seebeck coefficient of up to 447 uV/K, rapid response times, high temperature acc… ▽ More This study presents a novel thermocouple instrument designed for precise temperature monitoring within biological tissues and cells, addressing a significant gap in biological research. Constructed on a Silicon-On-Insulator (SOI) substrate, the instrument employs doped silicon and chromium/gold junctions, achieving a Seebeck coefficient of up to 447 uV/K, rapid response times, high temperature accuracy, and the necessary durability for tissue measurements. The cleanroom fabrication process yields a device featuring a triangular sensing tip. Using Finite Element Analysis (FEA) with COMSOL Multiphysics, the research delves into the device's thermal time constant within tissue environments. The device's efficacy in biological settings was validated by measuring temperatures inside ex-vivo tissue samples. Our findings, bolstered by FEA COMSOL simulations, confirm the device's robustness and applicability in biological studies. This advancement in thermocouple microneedle technology provides biologists with an instrument for accurately tracking temperature fluctuations in tissues. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16279 [pdf, other]

The nontrivial effects of annealing on superconducting properties of Nb single crystals

Authors: Amlan Datta, Kamal R. Joshi, Giulia Berti, Sunil Ghimire, Aidan Goerdt, Makariy A. Tanatar, Deborah L. Schlagel, Matthew F. Besser, Dapeng Jing, Matthew Kramer, Maria Iavarone, Ruslan Prozorov

Abstract: The effect of annealing on the superconducting properties of niobium single crystals cut from the same master boule was studied by local and global magnetic measurements, as well as scanning tunneling microscopy (STM). The formation of large hydride precipitates was observed in unannealed samples. The variation in structural and magnetic properties was studied after annealing under high vacuum at… ▽ More The effect of annealing on the superconducting properties of niobium single crystals cut from the same master boule was studied by local and global magnetic measurements, as well as scanning tunneling microscopy (STM). The formation of large hydride precipitates was observed in unannealed samples. The variation in structural and magnetic properties was studied after annealing under high vacuum at 800 C, 1400 C, and near the melting point of niobium (2477 C) for a few seconds. The initial samples had a high hydrogen content. Polarized optics and magneto-optical studies show that the formation of large niobium hydride precipitates is suppressed already by 800 C annealing. However, the overall superconducting properties in the annealed samples did not improve after annealing, and in fact, worsened. The superconducting transition temperature decreased, the upper critical field increased, and the pinning strength increased. Parallel studies were conducted using STM, where the sample was annealed initially at 400 C, measured, annealed again at 1700 C, and measured again. These studies revealed a ``dirty'' superconducting gap with a significant spatial variation of tunneling conductance after annealing at 400 C. The clean gap was recovered after annealing at 1700 C. It is likely that these results are due to oxygen redistribution near the surface, which is always covered by oxide layers in as-grown crystals. Overall, the results indicate that vacuum annealing at least up to 1400 C, while expected to remove a large amount of hydrogen, introduces additional nanosized defects, perhaps hydride precipitates, that act as efficient pair-breaking and pinning centers. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.14891 [pdf, other]

Creep-enhanced vortex pinning revealed through nonmonotonic relaxation of the Campbell length

Authors: Sunil Ghimire, Filippo Gaggioli, Kamal R. Joshi, Marcin Konczykowski, Romain Grasset, Elizabeth H. Krenkel, Amlan Datta, Makariy A. Tanatar, Shuzhang Chen, Cedomir Petrovic, Vadim B. Geshkenbein, Ruslan Prozorov

Abstract: We study the effects of flux creep on the linear AC response of the vortex lattice in single crystals Ca$_3$Ir$_4$Sn$_{13}$ by measuring the Campbell penetration depth, $λ_{\rm \scriptscriptstyle C}(T,H,t)$. Thermal fluctuations release vortices from shallow pinning sites, only for them to become re-trapped by deeper potential wells, causing an initial increase of the effective Labusch parameter,… ▽ More We study the effects of flux creep on the linear AC response of the vortex lattice in single crystals Ca$_3$Ir$_4$Sn$_{13}$ by measuring the Campbell penetration depth, $λ_{\rm \scriptscriptstyle C}(T,H,t)$. Thermal fluctuations release vortices from shallow pinning sites, only for them to become re-trapped by deeper potential wells, causing an initial increase of the effective Labusch parameter, which is proportional to the pinning well curvature. This effect cannot be detected in conventional magnetic relaxation measurements but is revealed by our observation of a nonmonotonic time evolution of $λ_{\rm \scriptscriptstyle C}(T,H,t)$, which directly probes the average curvature of the occupied pinning centers. The time evolution of $λ_{\rm \scriptscriptstyle C}(T,H,t)$ was measured at different temperatures in samples with different densities of pinning centers produced by electron irradiation. The curves can be collapsed together when plotted on a logarithmic time scale $t \to T\ln{(t/t_0)}$ confirming that the time evolution is driven by flux creep. The $λ_{\rm \scriptscriptstyle C}(T,H,t)$ is hysteretic with a noticeable nonmonotonic relaxation in the presence of a vortex density gradient (after zero-field cooling), but is monotonic after field cooling, where the vortex density is uniform. This result quantitatively corroborates the novel picture of vortex creep based on the strong pinning theory. △ Less

Submitted 31 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.11652 [pdf, other]

doi 10.1051/0004-6361/202348894

Small-scale magnetic flux emergence preceding a chain of energetic solar atmospheric events

Authors: D. Nóbrega-Siverio, I. Cabello, S. Bose, L. H. M. Rouppe van der Voort, R. Joshi, C. Froment, V. M. J. Henriques

Abstract: Advancements in instrumentation have revealed a multitude of small-scale EUV events in the solar atmosphere. Our aim is to employ high-resolution magnetograms to gain a detailed understanding of the magnetic origin of such phenomena. We have used coordinated observations from SST, IRIS, and SDO to analyze an ephemeral magnetic flux emergence episode and the following chain of small-scale energetic… ▽ More Advancements in instrumentation have revealed a multitude of small-scale EUV events in the solar atmosphere. Our aim is to employ high-resolution magnetograms to gain a detailed understanding of the magnetic origin of such phenomena. We have used coordinated observations from SST, IRIS, and SDO to analyze an ephemeral magnetic flux emergence episode and the following chain of small-scale energetic events. These unique observations clearly link these phenomena together. The high-resolution (0."057/pixel) magnetograms obtained with SST/CRISP allows us to reliably measure the magnetic field at the photosphere and detect the emerging bipole that causes the subsequent eruptive atmospheric events. Notably, this small-scale emergence episode remains indiscernible in the lower resolution SDO/HMI magnetograms (0."5/pixel). We report the appearance of a dark bubble in Ca II K related to the emerging bipole, a sign of the canonical expanding magnetic dome predicted in flux emergence simulations. Evidences of reconnection are also found: first through an Ellerman bomb, and later by the launch of a surge next to a UV burst. The UV burst exhibits a weak EUV counterpart in the coronal SDO/AIA channels. By calculating DEM, its plasma is shown to reach a temperature beyond 1 MK and have densities between the upper chromosphere and transition region. Our study showcases the importance of high-resolution magnetograms to unveil the mechanisms triggering phenomena such as EBs, UV bursts, and surges. This could hold implications for small-scale events akin to those recently reported in EUV using Solar Orbiter. The finding of temperatures beyond 1 MK in the UV burst plasma strongly suggests that we are examining analogous features. Therefore, we signal caution regarding drawing conclusions from full-disk magnetograms that lack the necessary resolution to reveal their true magnetic origin. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Accepted in A&A, 11 pages, 7 figures, 5 movies

Journal ref: A&A 686, A218 (2024)

arXiv:2403.11290 [pdf, other]

A sample of 25 radio galaxies with highly unusual radio morphologies, selected from the LoTSS-DR2 survey at 144 MHz

Authors: Gopal-Krishna, Dusmanta Patra, Ravi Joshi

Abstract: From a careful visual scrutiny of the radio structures of a well-defined sample of 2428 sources in the LoTSS DR2 survey made at 144 MHz with a 6" beam, we have selected a subset of 25 (i.e., 1%) sources showing highly unusual radio structures, not conforming to the prevalent radio morphological classification. Here we present and briefly discuss the basic properties of these rare morphological out… ▽ More From a careful visual scrutiny of the radio structures of a well-defined sample of 2428 sources in the LoTSS DR2 survey made at 144 MHz with a 6" beam, we have selected a subset of 25 (i.e., 1%) sources showing highly unusual radio structures, not conforming to the prevalent radio morphological classification. Here we present and briefly discuss the basic properties of these rare morphological outliers and attempt to dissect their morphological peculiarities, based on multi-wavelength radio images and radio-optical overlays. Also, we underscore the need to accord due importance to such anomalous radio sources, considering the challenge they pose to the standard theoretical models and simulations of extragalactic double radio sources. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 28 pages, 25 figures, submitted to Journal of Astrophysics and Astronomy, Comments welcome

arXiv:2403.08635 [pdf, other]

Human Alignment of Large Language Models through Online Preference Optimisation

Authors: Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

Abstract: Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human alignment has been extensively studied recently and several methods such as Reinforcement Learning from Human Feedback (RLHF), Direct Policy Optimisation (DPO) and Sequence Likelihood Calibration (SLiC) have emerged. In this paper, our contributio… ▽ More Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human alignment has been extensively studied recently and several methods such as Reinforcement Learning from Human Feedback (RLHF), Direct Policy Optimisation (DPO) and Sequence Likelihood Calibration (SLiC) have emerged. In this paper, our contribution is two-fold. First, we show the equivalence between two recent alignment methods, namely Identity Policy Optimisation (IPO) and Nash Mirror Descent (Nash-MD). Second, we introduce a generalisation of IPO, named IPO-MD, that leverages the regularised sampling approach proposed by Nash-MD. This equivalence may seem surprising at first sight, since IPO is an offline method whereas Nash-MD is an online method using a preference model. However, this equivalence can be proven when we consider the online version of IPO, that is when both generations are sampled by the online policy and annotated by a trained preference model. Optimising the IPO loss with such a stream of data becomes then equivalent to finding the Nash equilibrium of the preference model through self-play. Building on this equivalence, we introduce the IPO-MD algorithm that generates data with a mixture policy (between the online and reference policy) similarly as the general Nash-MD algorithm. We compare online-IPO and IPO-MD to different online versions of existing losses on preference data such as DPO and SLiC on a summarisation task. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.04981 [pdf, other]

Paving the Way for Pass Disturb Free Vertical NAND Storage via A Dedicated and String-Compatible Pass Gate

Authors: Zijian Zhao, Sola Woo, Khandker Akif Aabrar, Sharadindu Gopal Kirtania, Zhouhang Jiang, Shan Deng, Yi Xiao, Halid Mulaosmanovic, Stefan Duenkel, Dominik Kleimaier, Steven Soss, Sven Beyer, Rajiv Joshi, Scott Meninger, Mohamed Mohamed, Kijoon Kim, Jongho Woo, Suhwan Lim, Kwangsoo Kim, Wanki Kim, Daewon Ha, Vijaykrishnan Narayanan, Suman Datta, Shimeng Yu, Kai Ni

Abstract: In this work, we propose a dual-port cell design to address the pass disturb in vertical NAND storage, which can pass signals through a dedicated and string-compatible pass gate. We demonstrate that: i) the pass disturb-free feature originates from weakening of the depolarization field by the pass bias at the high-${V}_{TH}$ (HVT) state and the screening of the applied field by channel at the low-… ▽ More In this work, we propose a dual-port cell design to address the pass disturb in vertical NAND storage, which can pass signals through a dedicated and string-compatible pass gate. We demonstrate that: i) the pass disturb-free feature originates from weakening of the depolarization field by the pass bias at the high-${V}_{TH}$ (HVT) state and the screening of the applied field by channel at the low-${V}_{TH}$ (LVT) state; ii) combined simulations and experimental demonstrations of dual-port design verify the disturb-free operation in a NAND string, overcoming a key challenge in single-port designs; iii) the proposed design can be incorporated in a highly scaled vertical NAND FeFET string and the pass gate can be incorporated into the existing 3D NAND with the negligible overhead of the pass gate interconnection through a global bottom pass gate contact in the substrate. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 29 pages, 7 figures

arXiv:2402.09545 [pdf]

A 3D Memristor Architecture for In-Memory Computing Demonstrated with SHA3

Authors: Muayad J. Aljafar, Rasika Joshi, John M. Acken

Abstract: Security is a growing problem that needs hardware support. Memristors provide an alternative technology for hardware-supported security implementation. This paper presents a specific technique that utilizes the benefits of hybrid CMOS-memristors technology demonstrated with SHA3 over implementations that use only memristor technology. In the proposed technique, SHA3 is implemented in a set of perp… ▽ More Security is a growing problem that needs hardware support. Memristors provide an alternative technology for hardware-supported security implementation. This paper presents a specific technique that utilizes the benefits of hybrid CMOS-memristors technology demonstrated with SHA3 over implementations that use only memristor technology. In the proposed technique, SHA3 is implemented in a set of perpendicular crossbar arrays structured to facilitate logic implementation and circular bit rotation (Rho operation), which is perhaps the most complex operation in SHA3 when carried out in memristor arrays. The Rho operation itself is implemented with CMOS multiplexers (MUXs). The proposed accelerator is standby power-free and circumvents the memory access bottleneck in conventional computers. In addition, our design obscures the intermediate values from the I/O interface and outperforms the state-of-the-art memristor-based designs in terms of size and energy. Demonstrating the memristor implementation of SHA3 provides an impetus for utilizing memristors in information security applications. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 14 pages, 4 tables, 12 figures

arXiv:2402.06185 [pdf, other]

Development and validation of an artificial intelligence model to accurately predict spinopelvic parameters

Authors: Edward S. Harake, Joseph R. Linzey, Cheng Jiang, Rushikesh S. Joshi, Mark M. Zaki, Jaes C. Jones, Siri S. Khalsa, John H. Lee, Zachary Wilseck, Jacob R. Joseph, Todd C. Hollon, Paul Park

Abstract: Objective. Achieving appropriate spinopelvic alignment has been shown to be associated with improved clinical symptoms. However, measurement of spinopelvic radiographic parameters is time-intensive and interobserver reliability is a concern. Automated measurement tools have the promise of rapid and consistent measurements, but existing tools are still limited by some degree of manual user-entry re… ▽ More Objective. Achieving appropriate spinopelvic alignment has been shown to be associated with improved clinical symptoms. However, measurement of spinopelvic radiographic parameters is time-intensive and interobserver reliability is a concern. Automated measurement tools have the promise of rapid and consistent measurements, but existing tools are still limited by some degree of manual user-entry requirements. This study presents a novel artificial intelligence (AI) tool called SpinePose that automatically predicts spinopelvic parameters with high accuracy without the need for manual entry. Methods. SpinePose was trained and validated on 761 sagittal whole-spine X-rays to predict sagittal vertical axis (SVA), pelvic tilt (PT), pelvic incidence (PI), sacral slope (SS), lumbar lordosis (LL), T1-pelvic angle (T1PA), and L1-pelvic angle (L1PA). A separate test set of 40 X-rays was labeled by 4 reviewers, including fellowship-trained spine surgeons and a fellowship-trained radiologist with neuroradiology subspecialty certification. Median errors relative to the most senior reviewer were calculated to determine model accuracy on test images. Intraclass correlation coefficients (ICC) were used to assess inter-rater reliability. Results. SpinePose exhibited the following median (interquartile range) parameter errors: SVA: 2.2(2.3)mm, p=0.93; PT: 1.3(1.2)°, p=0.48; SS: 1.7(2.2)°, p=0.64; PI: 2.2(2.1)°, p=0.24; LL: 2.6(4.0)°, p=0.89; T1PA: 1.1(0.9)°, p=0.42; and L1PA: 1.4(1.6)°, p=0.49. Model predictions also exhibited excellent reliability at all parameters (ICC: 0.91-1.0). Conclusions. SpinePose accurately predicted spinopelvic parameters with excellent reliability comparable to fellowship-trained spine surgeons and neuroradiologists. Utilization of predictive AI tools in spinal imaging can substantially aid in patient selection and surgical planning. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 10 pages, 5 figures, to appear in Journal of Neurosurgery: Spine

arXiv:2402.01878 [pdf, other]

LiPO: Listwise Preference Optimization through Learning-to-Rank

Authors: Tianqi Liu, Zhen Qin, Junru Wu, Jiaming Shen, Misha Khalman, Rishabh Joshi, Yao Zhao, Mohammad Saleh, Simon Baumgartner, Jialu Liu, Peter J. Liu, Xuanhui Wang

Abstract: Aligning language models (LMs) with curated human feedback is critical to control their behaviors in real-world applications. Several recent policy optimization methods, such as DPO and SLiC, serve as promising alternatives to the traditional Reinforcement Learning from Human Feedback (RLHF) approach. In practice, human feedback often comes in a format of a ranked list over multiple responses to a… ▽ More Aligning language models (LMs) with curated human feedback is critical to control their behaviors in real-world applications. Several recent policy optimization methods, such as DPO and SLiC, serve as promising alternatives to the traditional Reinforcement Learning from Human Feedback (RLHF) approach. In practice, human feedback often comes in a format of a ranked list over multiple responses to amortize the cost of reading prompt. Multiple responses can also be ranked by reward models or AI feedback. There lacks such a thorough study on directly fitting upon a list of responses. In this work, we formulate the LM alignment as a \textit{listwise} ranking problem and describe the LiPO framework, where the policy can potentially learn more effectively from a ranked list of plausible responses given the prompt. This view draws an explicit connection to Learning-to-Rank (LTR), where most existing preference optimization work can be mapped to existing ranking objectives. Following this connection, we provide an examination of ranking objectives that are not well studied for LM alignment with DPO and SLiC as special cases when list size is two. In particular, we highlight a specific method, LiPO-$λ$, which leverages a state-of-the-art \textit{listwise} ranking objective and weights each preference pair in a more advanced manner. We show that LiPO-$λ$ can outperform DPO variants and SLiC by a clear margin on several preference alignment tasks with both curated and real rankwise preference data. △ Less

Submitted 22 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.12518 [pdf, ps, other]

Existence of three distinct scaling regimes in self-propelled rigid pitching airfoil

Authors: Rakshita Joshi, Jaywant Arakeri

Abstract: Oscillating foils in self-propelled mode are the simplest model for investigating oscillatory locomotion in cruising fishes. In this investigation, we explore the self-propulsion characterisitics of a NACA0015 section airfoil, with chord length $C$, subjected to sinusoidal pitching using a rotary apparatus. A power-spring-based crank-rocker mechanism actuates the airfoil. We examine the effect of… ▽ More Oscillating foils in self-propelled mode are the simplest model for investigating oscillatory locomotion in cruising fishes. In this investigation, we explore the self-propulsion characterisitics of a NACA0015 section airfoil, with chord length $C$, subjected to sinusoidal pitching using a rotary apparatus. A power-spring-based crank-rocker mechanism actuates the airfoil. We examine the effect of pitching frequency ($f$), amplitude ($A$), and the pitching point location ($p$) on the self-propulsion speed, $U_s$. We present the results in terms of self-propulsion Reynolds number ($Re_s = U_sC/ν$), reduced frequency ($k_s$), Strouhal number ($St$), and two non-dimensional speeds, $U^*_{BL}$ (body length per oscillation) and $U^*_{AL}$ (forward speed in terms of trailing edge excursion per oscillation). $U_s$ increases with frequency and amplitude, but a diminishing effect of amplitude was noted at larger amplitudes. Highest speeds were achieved when pitched closest to the leading edge, with $St$ in the range of $0.2 - 0.4$. Three distinct scaling regimes are identified, each characterized by a specific relationship between $Re_s$ and trailing-edge Reynolds number, $Re_{TE} = fAC/ν$. When pitching at low-amplitude, close to leading-edge, $Re_s \sim (1-2p) Re_{TE}^{3/2}$ (power scaling). For higher amplitude pitching, $Re_s \sim Re_{TE} (1-2p)^{1/2} (A/C)^{-1/2}$ (separable scaling). When pitched close to the midpoint, the airfoil propels beyond a threshold $Re_{TE,0}$, and $Re_s$ increases linearly with $Re_{TE}$ (linear scaling). Notably, in separable and linear regimes, self-propulsion is independent of viscosity. These relations collectively offer a comprehensive framework for understanding the self-propulsion of rigid pitching airfoils across a wide range of parameters. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.07786 [pdf, other]

Oscillating shocks in the transonic, viscous, variable $Γ$, accretion flows around black holes

Authors: Sanjit Debnath, Indranil Chattopadhyay, Raj Kishor Joshi

Abstract: We investigate the time evolution of the transonic-viscous accretion flow around a non-rotating black hole. The input parameters used for the simulation are obtained from semi-analytical solutions. This code is based on the TVD routine and correctly handles the angular momentum transport due to viscosity. The thermodynamic properties of the flow are described by a variable adiabatic index equation… ▽ More We investigate the time evolution of the transonic-viscous accretion flow around a non-rotating black hole. The input parameters used for the simulation are obtained from semi-analytical solutions. This code is based on the TVD routine and correctly handles the angular momentum transport due to viscosity. The thermodynamic properties of the flow are described by a variable adiabatic index equation of state. We regenerate the inviscid and viscous steady-state solutions, including shocks, using the simulation code and compare them with the semi-analytical solutions. The angular momentum piles up across a shock due to shock-jump conditions and viscous transport of angular momentum. This will push the shock-front outward and can result in shock oscillation or a complete destabilization of shock. We study how shocks behave in the presence of viscosity. As the viscosity parameter ($α$) crosses a critical value, the previously steady shock becomes time-dependent, eventually leading to oscillations. The value of this critical viscosity depends on the injection angular momentum ($\λ_{ou}$) and the specific energy ($ε$). We estimated the posteriori bremsstrahlung and synchrotron cooling, and the net radiative output also oscillates with the frequency of the shock. We also study the variation of frequency, amplitude, and mean position of oscillation with $α$. Considering a black hole with a mass of $10M_{\odot}$, we observed that the power spectrum exhibits a prominent peak at the fundamental frequency of a few to about tens of Hz, accompanied by multiple harmonics. This characteristic is frequently observed in numerous accreting black hole candidates. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: Accepted for publication in MNRAS; 18 pages, 16 figures

arXiv:2401.07094 [pdf, ps, other]

The Bogomolov multiplier of a multiplicative Lie algebra

Authors: Amit Kumar, Renu Joshi, Mani Shankar Pandey, Sumit Kumar Upadhyay

Abstract: In this paper, we develop the concept of the Bogomolov multiplier for a multiplicative Lie algebra and establish a Hopf-type formula. Consequently, we see that the Bogomolov multipliers of two isoclinic multiplicative Lie algebras are isomorphic. In this paper, we develop the concept of the Bogomolov multiplier for a multiplicative Lie algebra and establish a Hopf-type formula. Consequently, we see that the Bogomolov multipliers of two isoclinic multiplicative Lie algebras are isomorphic. △ Less

Submitted 13 January, 2024; originally announced January 2024.

arXiv:2401.05334 [pdf, other]

URHand: Universal Relightable Hands

Authors: Zhaoxi Chen, Gyeongsik Moon, Kaiwen Guo, Chen Cao, Stanislav Pidhorskyi, Tomas Simon, Rohan Joshi, Yuan Dong, Yichen Xu, Bernardo Pires, He Wen, Lucas Evans, Bo Peng, Julia Buffalini, Autumn Trimble, Kevyn McPhail, Melissa Schoeller, Shoou-I Yu, Javier Romero, Michael Zollhöfer, Yaser Sheikh, Ziwei Liu, Shunsuke Saito

Abstract: Existing photorealistic relightable hand models require extensive identity-specific observations in different views, poses, and illuminations, and face challenges in generalizing to natural illuminations and novel identities. To bridge this gap, we present URHand, the first universal relightable hand model that generalizes across viewpoints, poses, illuminations, and identities. Our model allows f… ▽ More Existing photorealistic relightable hand models require extensive identity-specific observations in different views, poses, and illuminations, and face challenges in generalizing to natural illuminations and novel identities. To bridge this gap, we present URHand, the first universal relightable hand model that generalizes across viewpoints, poses, illuminations, and identities. Our model allows few-shot personalization using images captured with a mobile phone, and is ready to be photorealistically rendered under novel illuminations. To simplify the personalization process while retaining photorealism, we build a powerful universal relightable prior based on neural relighting from multi-view images of hands captured in a light stage with hundreds of identities. The key challenge is scaling the cross-identity training while maintaining personalized fidelity and sharp details without compromising generalization under natural illuminations. To this end, we propose a spatially varying linear lighting model as the neural renderer that takes physics-inspired shading as input feature. By removing non-linear activations and bias, our specifically designed lighting model explicitly keeps the linearity of light transport. This enables single-stage training from light-stage data while generalizing to real-time rendering under arbitrary continuous illuminations across diverse identities. In addition, we introduce the joint learning of a physically based model and our neural relighting model, which further improves fidelity and generalization. Extensive experiments show that our approach achieves superior performance over existing methods in terms of both quality and generalizability. We also demonstrate quick personalization of URHand from a short phone scan of an unseen identity. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: Project Page https://frozenburning.github.io/projects/urhand/

arXiv:2401.02254 [pdf, other]

L3Cube-IndicNews: News-based Short Text and Long Document Classification Datasets in Indic Languages

Authors: Aishwarya Mirashi, Srushti Sonavane, Purva Lingayat, Tejas Padhiyar, Raviraj Joshi

Abstract: In this work, we introduce L3Cube-IndicNews, a multilingual text classification corpus aimed at curating a high-quality dataset for Indian regional languages, with a specific focus on news headlines and articles. We have centered our work on 10 prominent Indic languages, including Hindi, Bengali, Marathi, Telugu, Tamil, Gujarati, Kannada, Odia, Malayalam, and Punjabi. Each of these news datasets c… ▽ More In this work, we introduce L3Cube-IndicNews, a multilingual text classification corpus aimed at curating a high-quality dataset for Indian regional languages, with a specific focus on news headlines and articles. We have centered our work on 10 prominent Indic languages, including Hindi, Bengali, Marathi, Telugu, Tamil, Gujarati, Kannada, Odia, Malayalam, and Punjabi. Each of these news datasets comprises 10 or more classes of news articles. L3Cube-IndicNews offers 3 distinct datasets tailored to handle different document lengths that are classified as: Short Headlines Classification (SHC) dataset containing the news headline and news category, Long Document Classification (LDC) dataset containing the whole news article and the news category, and Long Paragraph Classification (LPC) containing sub-articles of the news and the news category. We maintain consistent labeling across all 3 datasets for in-depth length-based analysis. We evaluate each of these Indic language datasets using 4 different models including monolingual BERT, multilingual Indic Sentence BERT (IndicSBERT), and IndicBERT. This research contributes significantly to expanding the pool of available text classification datasets and also makes it possible to develop topic classification models for Indian regional languages. This also serves as an excellent resource for cross-lingual analysis owing to the high overlap of labels among languages. The datasets and models are shared publicly at https://github.com/l3cube-pune/indic-nlp △ Less

Submitted 26 April, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

Comments: Accepted at the International Conference on Natural Language Processing (ICON 2023)

arXiv:2401.00170 [pdf, ps, other]

doi 10.1145/3632754.3632764

L3Cube-MahaSocialNER: A Social Media based Marathi NER Dataset and BERT models

Authors: Harsh Chaudhari, Anuja Patil, Dhanashree Lavekar, Pranav Khairnar, Raviraj Joshi

Abstract: This work introduces the L3Cube-MahaSocialNER dataset, the first and largest social media dataset specifically designed for Named Entity Recognition (NER) in the Marathi language. The dataset comprises 18,000 manually labeled sentences covering eight entity classes, addressing challenges posed by social media data, including non-standard language and informal idioms. Deep learning models, includin… ▽ More This work introduces the L3Cube-MahaSocialNER dataset, the first and largest social media dataset specifically designed for Named Entity Recognition (NER) in the Marathi language. The dataset comprises 18,000 manually labeled sentences covering eight entity classes, addressing challenges posed by social media data, including non-standard language and informal idioms. Deep learning models, including CNN, LSTM, BiLSTM, and Transformer models, are evaluated on the individual dataset with IOB and non-IOB notations. The results demonstrate the effectiveness of these models in accurately recognizing named entities in Marathi informal text. The L3Cube-MahaSocialNER dataset offers user-centric information extraction and supports real-time applications, providing a valuable resource for public opinion analysis, news, and marketing on social media platforms. We also show that the zero-shot results of the regular NER model are poor on the social NER test set thus highlighting the need for more social NER datasets. The datasets and models are publicly available at https://github.com/l3cube-pune/MarathiNLP △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: Accepted at Forum for Information Retrieval Evaluation (FIRE 2023)

arXiv:2312.17166 [pdf, other]

doi 10.1088/1361-648X/ad3539

Signatures of quantum phases in a dissipative system

Authors: Rohan Joshi, Saikat Mondal, Souvik Bandyopadhyay, Sourav Bhattacharjee, Adhip Agarwala

Abstract: Lindbladian formalism, as tuned to dissipative and open systems, has been all-pervasive to interpret non-equilibrium steady states of quantum many-body systems. We study the fate of free fermionic and superconducting phases in a dissipative one-dimensional Kitaev model - where the bath acts both as a source and a sink of fermionic particles with different coupling rates. As a function of these two… ▽ More Lindbladian formalism, as tuned to dissipative and open systems, has been all-pervasive to interpret non-equilibrium steady states of quantum many-body systems. We study the fate of free fermionic and superconducting phases in a dissipative one-dimensional Kitaev model - where the bath acts both as a source and a sink of fermionic particles with different coupling rates. As a function of these two couplings, we investigate the steady state, its entanglement content, and its approach from varying initial states. Interestingly, we find that the steady state phase diagram retains decipherable signatures of ground state critical physics. We also show that early-time fidelity is a useful marker to find a subclass of phase transitions in such situations. Moreover, we show that the survival of critical signatures at late-times, strongly depend on the thermal nature of the steady state. This connection hints at a correspondence between quantum observables and classical magnetism in the steady state of such systems. Our work uncovers interesting connections between dissipative quantum many-body systems, thermalization of a classical spin and many-body quantum critical phenomena. △ Less

Submitted 16 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: 14 pages, 10 figures

Journal ref: Journal of Physics: Condensed Matter 36 (2024) 275601

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.07609 [pdf, other]

doi 10.1007/978-3-031-49601-1_11

IndoorGNN: A Graph Neural Network based approach for Indoor Localization using WiFi RSSI

Authors: Rahul Vishwakarma, Rucha Bhalchandra Joshi, Subhankar Mishra

Abstract: Indoor localization is the process of determining the location of a person or object inside a building. Potential usage of indoor localization includes navigation, personalization, safety and security, and asset tracking. Commonly used technologies for indoor localization include WiFi, Bluetooth, RFID, and Ultra-wideband. Among these, WiFi's Received Signal Strength Indicator (RSSI)-based localiza… ▽ More Indoor localization is the process of determining the location of a person or object inside a building. Potential usage of indoor localization includes navigation, personalization, safety and security, and asset tracking. Commonly used technologies for indoor localization include WiFi, Bluetooth, RFID, and Ultra-wideband. Among these, WiFi's Received Signal Strength Indicator (RSSI)-based localization is preferred because of widely available WiFi Access Points (APs). We have two main contributions. First, we develop our method, 'IndoorGNN' which involves using a Graph Neural Network (GNN) based algorithm in a supervised manner to classify a specific location into a particular region based on the RSSI values collected at that location. Most of the ML algorithms that perform this classification require a large number of labeled data points (RSSI vectors with location information). Collecting such data points is a labor-intensive and time-consuming task. To overcome this challenge, as our second contribution, we demonstrate the performance of IndoorGNN on the restricted dataset. It shows a comparable prediction accuracy to that of the complete dataset. We performed experiments on the UJIIndoorLoc and MNAV datasets, which are real-world standard indoor localization datasets. Our experiments show that IndoorGNN gives better location prediction accuracies when compared with state-of-the-art existing conventional as well as GNN-based methods for this same task. It continues to outperform these algorithms even with restricted datasets. It is noteworthy that its performance does not decrease a lot with a decrease in the number of available data points. Our method can be utilized for navigation and wayfinding in complex indoor environments, asset tracking and building management, enhancing mobile applications with location-based services, and improving safety and security during emergencies. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Journal ref: Lecture Notes in Computer Science, vol 14418, year 2023. Springer, Cham

arXiv:2312.07418 [pdf]

Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023)

Authors: Kabita Parajuli, Shashidhar Ram Joshi

Abstract: Video captioning in Nepali, a language written in the Devanagari script, presents a unique challenge due to the lack of existing academic work in this domain. This work develops a novel encoder-decoder paradigm for Nepali video captioning to tackle this difficulty. LSTM and GRU sequence-to-sequence models are used in the model to produce related textual descriptions based on features retrieved fro… ▽ More Video captioning in Nepali, a language written in the Devanagari script, presents a unique challenge due to the lack of existing academic work in this domain. This work develops a novel encoder-decoder paradigm for Nepali video captioning to tackle this difficulty. LSTM and GRU sequence-to-sequence models are used in the model to produce related textual descriptions based on features retrieved from video frames using CNNs. Using Google Translate and manual post-editing, a Nepali video captioning dataset is generated from the Microsoft Research Video Description Corpus (MSVD) dataset created using Google Translate, and manual post-editing work. The efficiency of the model for Devanagari-scripted video captioning is demonstrated by BLEU, METOR, and ROUGE measures, which are used to assess its performance. △ Less

Submitted 19 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.01306 [pdf, other]

doi 10.1007/978-981-99-6550-2_37

On Significance of Subword tokenization for Low Resource and Efficient Named Entity Recognition: A case study in Marathi

Authors: Harsh Chaudhari, Anuja Patil, Dhanashree Lavekar, Pranav Khairnar, Raviraj Joshi, Sachin Pande

Abstract: Named Entity Recognition (NER) systems play a vital role in NLP applications such as machine translation, summarization, and question-answering. These systems identify named entities, which encompass real-world concepts like locations, persons, and organizations. Despite extensive research on NER systems for the English language, they have not received adequate attention in the context of low reso… ▽ More Named Entity Recognition (NER) systems play a vital role in NLP applications such as machine translation, summarization, and question-answering. These systems identify named entities, which encompass real-world concepts like locations, persons, and organizations. Despite extensive research on NER systems for the English language, they have not received adequate attention in the context of low resource languages. In this work, we focus on NER for low-resource language and present our case study in the context of the Indian language Marathi. The advancement of NLP research revolves around the utilization of pre-trained transformer models such as BERT for the development of NER models. However, we focus on improving the performance of shallow models based on CNN, and LSTM by combining the best of both worlds. In the era of transformers, these traditional deep learning models are still relevant because of their high computational efficiency. We propose a hybrid approach for efficient NER by integrating a BERT-based subword tokenizer into vanilla CNN/LSTM models. We show that this simple approach of replacing a traditional word-based tokenizer with a BERT-tokenizer brings the accuracy of vanilla single-layer models closer to that of deep pre-trained models like BERT. We show the importance of using sub-word tokenization for NER and present our study toward building efficient NLP systems. The evaluation is performed on L3Cube-MahaNER dataset using tokenizers from MahaBERT, MahaGPT, IndicBERT, and mBERT. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: Accepted at ICDAM 2023

arXiv:2312.01206 [pdf, other]

The Formation of Bars and Warps in Rotating Halos

Authors: Robin Joshi, Lawrence M. Widrow

Abstract: We investigate the effects of halo kinematics on the dynamics of stellar discs by simulating the evolution of isolated disc-halo systems from equilibrium initial conditions. Our main results come from four simulations where the initial disc is identical and the halo is either treated as a rigid potential or is live with isotropic orbits or orbits that preferentially rotate with or counter to the d… ▽ More We investigate the effects of halo kinematics on the dynamics of stellar discs by simulating the evolution of isolated disc-halo systems from equilibrium initial conditions. Our main results come from four simulations where the initial disc is identical and the halo is either treated as a rigid potential or is live with isotropic orbits or orbits that preferentially rotate with or counter to the disc. We confirm previous results that bar formation is more vigorous in models with a live halo than a rigid one and is further enhanced when halo orbits preferentially rotate with the disc. We discuss two types of buckling events with different symmetries about the mid plane, one that occurs just as the bar is forming and the other well after the bar has been established. We also show that warps are most easily excited and maintained when the halo is counter-rotating with the disc, in agreement with theoretical predictions. Our most novel result is the discovery of a rotating halo instability, which causes the disc and halo cusp to spiral outward from the centre of mass of the system whether the halo rotates with the disc or counter to it and also occurs in a disc bulge halo system that does not form a bar. We provide a heuristic linear model that captures the essential dynamics of the instability. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: 14 pages, 13 figures. Accepted for publication in Monthly Notices of the Royal Astronomical Society

arXiv:2312.01107 [pdf, other]

Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning

Authors: Raviraj Joshi, Nikesh Garera

Abstract: Text-to-speech (TTS) systems are being built using end-to-end deep learning approaches. However, these systems require huge amounts of training data. We present our approach to built production quality TTS and perform speaker adaptation in extremely low resource settings. We propose a transfer learning approach using high-resource language data and synthetically generated data. We transfer the lea… ▽ More Text-to-speech (TTS) systems are being built using end-to-end deep learning approaches. However, these systems require huge amounts of training data. We present our approach to built production quality TTS and perform speaker adaptation in extremely low resource settings. We propose a transfer learning approach using high-resource language data and synthetically generated data. We transfer the learnings from the out-domain high-resource English language. Further, we make use of out-of-the-box single-speaker TTS in the target language to generate in-domain synthetic data. We employ a three-step approach to train a high-quality single-speaker TTS system in a low-resource Indian language Hindi. We use a Tacotron2 like setup with a spectrogram prediction network and a waveglow vocoder. The Tacotron2 acoustic model is trained on English data, followed by synthetic Hindi data from the existing TTS system. Finally, the decoder of this model is fine-tuned on only 3 hours of target Hindi speaker data to enable rapid speaker adaptation. We show the importance of this dual pre-training and decoder-only fine-tuning using subjective MOS evaluation. Using transfer learning from high-resource language and synthetic corpus we present a low-cost solution to train a custom TTS model. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: Accepted at PACLIC 2023

arXiv:2312.01103 [pdf, other]

doi 10.1007/978-3-031-48312-7_12

Code-Mixed Text to Speech Synthesis under Low-Resource Constraints

Authors: Raviraj Joshi, Nikesh Garera

Abstract: Text-to-speech (TTS) systems are an important component in voice-based e-commerce applications. These applications include end-to-end voice assistant and customer experience (CX) voice bot. Code-mixed TTS is also relevant in these applications since the product names are commonly described in English while the surrounding text is in a regional language. In this work, we describe our approaches for… ▽ More Text-to-speech (TTS) systems are an important component in voice-based e-commerce applications. These applications include end-to-end voice assistant and customer experience (CX) voice bot. Code-mixed TTS is also relevant in these applications since the product names are commonly described in English while the surrounding text is in a regional language. In this work, we describe our approaches for production quality code-mixed Hindi-English TTS systems built for e-commerce applications. We propose a data-oriented approach by utilizing monolingual data sets in individual languages. We leverage a transliteration model to convert the Roman text into a common Devanagari script and then combine both datasets for training. We show that such single script bi-lingual training without any code-mixing works well for pure code-mixed test sets. We further present an exhaustive evaluation of single-speaker adaptation and multi-speaker training with Tacotron2 + Waveglow setup to show that the former approach works better. These approaches are also coupled with transfer learning and decoder-only fine-tuning to improve performance. We compare these approaches with the Google TTS and report a positive CMOS score of 0.02 with the proposed transfer learning approach. We also perform low-resource voice adaptation experiments to show that a new voice can be onboarded with just 3 hrs of data. This highlights the importance of our pre-trained models in resource-constrained settings. This subjective evaluation is performed on a large number of out-of-domain pure code-mixed sentences to demonstrate the high quality of the systems. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: Accepted at SPECOM 2023

arXiv:2311.17722 [pdf, other]

SenTest: Evaluating Robustness of Sentence Encoders

Authors: Tanmay Chavan, Shantanu Patankar, Aditya Kane, Omkar Gokhale, Geetanjali Kale, Raviraj Joshi

Abstract: Contrastive learning has proven to be an effective method for pre-training models using weakly labeled data in the vision domain. Sentence transformers are the NLP counterparts to this architecture, and have been growing in popularity due to their rich and effective sentence representations. Having effective sentence representations is paramount in multiple tasks, such as information retrieval, re… ▽ More Contrastive learning has proven to be an effective method for pre-training models using weakly labeled data in the vision domain. Sentence transformers are the NLP counterparts to this architecture, and have been growing in popularity due to their rich and effective sentence representations. Having effective sentence representations is paramount in multiple tasks, such as information retrieval, retrieval augmented generation (RAG), and sentence comparison. Keeping in mind the deployability factor of transformers, evaluating the robustness of sentence transformers is of utmost importance. This work focuses on evaluating the robustness of the sentence encoders. We employ several adversarial attacks to evaluate its robustness. This system uses character-level attacks in the form of random character substitution, word-level attacks in the form of synonym replacement, and sentence-level attacks in the form of intra-sentence word order shuffling. The results of the experiments strongly undermine the robustness of sentence encoders. The models produce significantly different predictions as well as embeddings on perturbed datasets. The accuracy of the models can fall up to 15 percent on perturbed datasets as compared to unperturbed datasets. Furthermore, the experiments demonstrate that these embeddings does capture the semantic and syntactic structure (sentence order) of sentences. However, existing supervised classification strategies fail to leverage this information, and merely function as n-gram detectors. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.14335 [pdf, other]

Comparative Analysis of Transformers for Modeling Tabular Data: A Casestudy using Industry Scale Dataset

Authors: Usneek Singh, Piyush Arora, Shamika Ganesan, Mohit Kumar, Siddhant Kulkarni, Salil R. Joshi

Abstract: We perform a comparative analysis of transformer-based models designed for modeling tabular data, specifically on an industry-scale dataset. While earlier studies demonstrated promising outcomes on smaller public or synthetic datasets, the effectiveness did not extend to larger industry-scale datasets. The challenges identified include handling high-dimensional data, the necessity for efficient pr… ▽ More We perform a comparative analysis of transformer-based models designed for modeling tabular data, specifically on an industry-scale dataset. While earlier studies demonstrated promising outcomes on smaller public or synthetic datasets, the effectiveness did not extend to larger industry-scale datasets. The challenges identified include handling high-dimensional data, the necessity for efficient pre-processing of categorical and numerical features, and addressing substantial computational requirements. To overcome the identified challenges, the study conducts an extensive examination of various transformer-based models using both synthetic datasets and the default prediction Kaggle dataset (2022) from American Express. The paper presents crucial insights into optimal data pre-processing, compares pre-training and direct supervised learning methods, discusses strategies for managing categorical and numerical features, and highlights trade-offs between computational resources and performance. Focusing on temporal financial data modeling, the research aims to facilitate the systematic development and deployment of transformer-based models in real-world scenarios, emphasizing scalability. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: Accepted at 7th Joint International Conference on Data Science & Management of Data (11th ACMIKDD CODS and 29th COMAD)

Showing 1–50 of 296 results for author: Joshi, R