Search | arXiv e-print repository

'Since Lawyers are Males..': Examining Implicit Gender Bias in Hindi Language Generation by LLMs

Authors: Ishika Joshi, Ishita Gupta, Adrita Dey, Tapan Parikh

Abstract: Large Language Models (LLMs) are increasingly being used to generate text across various languages, for tasks such as translation, customer support, and education. Despite these advancements, LLMs show notable gender biases in English, which become even more pronounced when generating content in relatively underrepresented languages like Hindi. This study explores implicit gender biases in Hindi t… ▽ More Large Language Models (LLMs) are increasingly being used to generate text across various languages, for tasks such as translation, customer support, and education. Despite these advancements, LLMs show notable gender biases in English, which become even more pronounced when generating content in relatively underrepresented languages like Hindi. This study explores implicit gender biases in Hindi text generation and compares them to those in English. We developed Hindi datasets inspired by WinoBias to examine stereotypical patterns in responses from models like GPT-4o and Claude-3 sonnet. Our results reveal a significant gender bias of 87.8% in Hindi, compared to 33.4% in English GPT-4o generation, with Hindi responses frequently relying on gender stereotypes related to occupations, power hierarchies, and social class. This research underscores the variation in gender biases across languages and provides considerations for navigating these biases in generative AI systems. △ Less

Submitted 20 September, 2024; originally announced September 2024.

arXiv:2409.10545 [pdf, other]

ResEmoteNet: Bridging Accuracy and Loss Reduction in Facial Emotion Recognition

Authors: Arnab Kumar Roy, Hemant Kumar Kathania, Adhitiya Sharma, Abhishek Dey, Md. Sarfaraj Alam Ansari

Abstract: The human face is a silent communicator, expressing emotions and thoughts through its facial expressions. With the advancements in computer vision in recent years, facial emotion recognition technology has made significant strides, enabling machines to decode the intricacies of facial cues. In this work, we propose ResEmoteNet, a novel deep learning architecture for facial emotion recognition desi… ▽ More The human face is a silent communicator, expressing emotions and thoughts through its facial expressions. With the advancements in computer vision in recent years, facial emotion recognition technology has made significant strides, enabling machines to decode the intricacies of facial cues. In this work, we propose ResEmoteNet, a novel deep learning architecture for facial emotion recognition designed with the combination of Convolutional, Squeeze-Excitation (SE) and Residual Networks. The inclusion of SE block selectively focuses on the important features of the human face, enhances the feature representation and suppresses the less relevant ones. This helps in reducing the loss and enhancing the overall model performance. We also integrate the SE block with three residual blocks that help in learning more complex representation of the data through deeper layers. We evaluated ResEmoteNet on three open-source databases: FER2013, RAF-DB, and AffectNet, achieving accuracies of 79.79%, 94.76%, and 72.39%, respectively. The proposed network outperforms state-of-the-art models across all three databases. The source code for ResEmoteNet is available at https://github.com/ArnabKumarRoy02/ResEmoteNet. △ Less

Submitted 1 September, 2024; originally announced September 2024.

Comments: 5 pages, 3 figures, 3 tables

arXiv:2408.14460 [pdf, other]

Cloud-Based Federation Framework and Prototype for Open, Scalable, and Shared Access to NextG and IoT Testbeds

Authors: Maxwell McManus, Tenzin Rinchen, Annoy Dey, Sumanth Thota, Zhaoxi Zhang, Jiangqi Hu, Xi Wang, Mingyue Ji, Nicholas Mastronarde, Elizabeth Serena Bentley, Michael Medley, Zhangyu Guan

Abstract: In this work, we present a new federation framework for UnionLabs, an innovative cloud-based resource-sharing infrastructure designed for next-generation (NextG) and Internet of Things (IoT) over-the-air (OTA) experiments. The framework aims to reduce the federation complexity for testbeds developers by automating tedious backend operations, thereby providing scalable federation and remote access… ▽ More In this work, we present a new federation framework for UnionLabs, an innovative cloud-based resource-sharing infrastructure designed for next-generation (NextG) and Internet of Things (IoT) over-the-air (OTA) experiments. The framework aims to reduce the federation complexity for testbeds developers by automating tedious backend operations, thereby providing scalable federation and remote access to various wireless testbeds. We first describe the key components of the new federation framework, including the Systems Manager Integration Engine (SMIE), the Automated Script Generator (ASG), and the Database Context Manager (DCM). We then prototype and deploy the new Federation Plane on the Amazon Web Services (AWS) public cloud, demonstrating its effectiveness by federating two wireless testbeds: i) UB NeXT, a 5G-and-beyond (5G+) testbed at the University at Buffalo, and ii) UT IoT, an IoT testbed at the University of Utah. Through this work we aim to initiate a grassroots campaign to democratize access to wireless research testbeds with heterogeneous hardware resources and network environment, and accelerate the establishment of a mature, open experimental ecosystem for the wireless community. The API of the new Federation Plane will be released to the community after internal testing is completed. △ Less

Submitted 28 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

arXiv:2407.19812 [pdf, other]

Image-text matching for large-scale book collections

Authors: Artemis Llabrés, Arka Ujjal Dey, Dimosthenis Karatzas, Ernest Valveny

Abstract: We address the problem of detecting and mapping all books in a collection of images to entries in a given book catalogue. Instead of performing independent retrieval for each book detected, we treat the image-text mapping problem as a many-to-many matching process, looking for the best overall match between the two sets. We combine a state-of-the-art segmentation method (SAM) to detect book spines… ▽ More We address the problem of detecting and mapping all books in a collection of images to entries in a given book catalogue. Instead of performing independent retrieval for each book detected, we treat the image-text mapping problem as a many-to-many matching process, looking for the best overall match between the two sets. We combine a state-of-the-art segmentation method (SAM) to detect book spines and extract book information using a commercial OCR. We then propose a two-stage approach for text-image matching, where CLIP embeddings are used first for fast matching, followed by a second slower stage to refine the matching, employing either the Hungarian Algorithm or a BERT-based model trained to cope with noisy OCR input and partial text matches. To evaluate our approach, we publish a new dataset of annotated bookshelf images that covers the whole book collection of a public library in Spain. In addition, we provide two target lists of book metadata, a closed-set of 15k book titles that corresponds to the known library inventory, and an open-set of 2.3M book titles to simulate an open-world scenario. We report results on two settings, on one hand on a matching-only task, where the book segments and OCR is given and the objective is to perform many-to-many matching against the target lists, and a combined detection and matching task, where books must be first detected and recognised before they are matched to the target list entries. We show that both the Hungarian Matching and the proposed BERT-based model outperform a fuzzy string matching baseline, and we highlight inherent limitations of the matching algorithms as the target increases in size, and when either of the two sets (detected books or target book list) is incomplete. The dataset and code are available at https://github.com/llabres/library-dataset △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.19481 [pdf, other]

What can we learn about Reionization astrophysical parameters using Gaussian Process Regression?

Authors: Purba Mukherjee, Antara Dey, Supratik Pal

Abstract: Reionization is one of the least understood processes in the evolution history of the Universe, mostly because of the numerous astrophysical processes occurring simultaneously about which we do not have a very clear idea so far. In this article, we use the Gaussian Process Regression (GPR) method to learn the reionization history and infer the astrophysical parameters. We reconstruct the UV lumino… ▽ More Reionization is one of the least understood processes in the evolution history of the Universe, mostly because of the numerous astrophysical processes occurring simultaneously about which we do not have a very clear idea so far. In this article, we use the Gaussian Process Regression (GPR) method to learn the reionization history and infer the astrophysical parameters. We reconstruct the UV luminosity density function using the HFF and early JWST data. From the reconstructed history of reionization, the global differential brightness temperature fluctuation during this epoch has been computed. We perform MCMC analysis of the global 21-cm signal using the instrumental specifications of SARAS, in combination with Lyman-$α$ ionization fraction data, Planck optical depth measurements and UV luminosity data. Our analysis reveals that GPR can help infer the astrophysical parameters in a model-agnostic way than conventional methods. Additionally, we analyze the 21-cm power spectrum using the reconstructed history of reionization and demonstrate how the future 21-cm mission SKA, in combination with Planck and Lyman-$α$ forest data, improves the bounds on the reionization astrophysical parameters by doing a joint MCMC analysis for the astrophysical parameters plus 6 cosmological parameters for $Λ$CDM model. The results make the GPR-based reconstruction technique a robust learning process and the inferences on the astrophysical parameters obtained therefrom are quite reliable that can be used for future analysis. △ Less

Submitted 28 July, 2024; originally announced July 2024.

arXiv:2405.13967 [pdf, other]

DeTox: Toxic Subspace Projection for Model Editing

Authors: Rheeya Uppaal, Apratim Dey, Yiting He, Yiqiao Zhong, Junjie Hu

Abstract: Recent alignment algorithms such as direct preference optimization (DPO) have been developed to improve the safety of large language models (LLMs) by training these models to match human behaviors exemplified by preference data. However, these methods are both computationally intensive and lacking in controllability and transparency, making them prone to jailbreaking and inhibiting their widesprea… ▽ More Recent alignment algorithms such as direct preference optimization (DPO) have been developed to improve the safety of large language models (LLMs) by training these models to match human behaviors exemplified by preference data. However, these methods are both computationally intensive and lacking in controllability and transparency, making them prone to jailbreaking and inhibiting their widespread use. Furthermore, these tuning-based methods require large-scale preference data for training and are susceptible to noisy preference data. In this paper, we introduce a tuning-free alignment alternative (DeTox) and demonstrate its effectiveness under the use case of toxicity reduction. Grounded on theory from factor analysis, DeTox is a sample-efficient model editing approach that identifies a toxic subspace in the model parameter space and reduces model toxicity by projecting away the detected subspace. The toxic sub-space is identified by extracting preference data embeddings from the language model, and removing non-toxic information from these embeddings. We show that DeTox is more sample-efficient than DPO, further showcasing greater robustness to noisy data. Finally, we establish both theoretical and empirical connections between DeTox and DPO, showing that DeTox can be interpreted as a denoised version of a single DPO step. △ Less

Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

Comments: Preprint

arXiv:2405.04539 [pdf, other]

Some variation of COBRA in sequential learning setup

Authors: Aryan Bhambu, Arabin Kumar Dey

Abstract: This research paper introduces innovative approaches for multivariate time series forecasting based on different variations of the combined regression strategy. We use specific data preprocessing techniques which makes a radical change in the behaviour of prediction. We compare the performance of the model based on two types of hyper-parameter tuning Bayesian optimisation (BO) and Usual Grid searc… ▽ More This research paper introduces innovative approaches for multivariate time series forecasting based on different variations of the combined regression strategy. We use specific data preprocessing techniques which makes a radical change in the behaviour of prediction. We compare the performance of the model based on two types of hyper-parameter tuning Bayesian optimisation (BO) and Usual Grid search. Our proposed methodologies outperform all state-of-the-art comparative models. We illustrate the methodologies through eight time series datasets from three categories: cryptocurrency, stock index, and short-term load forecasting. △ Less

Submitted 7 April, 2024; originally announced May 2024.

arXiv:2404.14665 [pdf, other]

Illuminating the Unseen: Investigating the Context-induced Harms in Behavioral Sensing

Authors: Han Zhang, Vedant Das Swain, Leijie Wang, Nan Gao, Yilun Sheng, Xuhai Xu, Flora D. Salim, Koustuv Saha, Anind K. Dey, Jennifer Mankoff

Abstract: Behavioral sensing technologies are rapidly evolving across a range of well-being applications. Despite its potential, concerns about the responsible use of such technology are escalating. In response, recent research within the sensing technology has started to address these issues. While promising, they primarily focus on broad demographic categories and overlook more nuanced, context-specific i… ▽ More Behavioral sensing technologies are rapidly evolving across a range of well-being applications. Despite its potential, concerns about the responsible use of such technology are escalating. In response, recent research within the sensing technology has started to address these issues. While promising, they primarily focus on broad demographic categories and overlook more nuanced, context-specific identities. These approaches lack grounding within domain-specific harms that arise from deploying sensing technology in diverse social, environmental, and technological settings. Additionally, existing frameworks for evaluating harms are designed for a generic ML life cycle, and fail to adapt to the dynamic and longitudinal considerations for behavioral sensing technology. To address these gaps, we introduce a framework specifically designed for evaluating behavioral sensing technologies. This framework emphasizes a comprehensive understanding of context, particularly the situated identities of users and the deployment settings of the sensing technology. It also highlights the necessity for iterative harm mitigation and continuous maintenance to adapt to the evolving nature of technology and its use. We demonstrate the feasibility and generalizability of our framework through post-hoc evaluations on two real-world behavioral sensing studies conducted in different international contexts, involving varied population demographics and machine learning tasks. Our evaluations provide empirical evidence of both situated identity-based harm and more domain-specific harms, and discuss the trade-offs introduced by implementing bias mitigation techniques. △ Less

Submitted 5 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 26 pages, 8 tables, and 1 figure (excluding appendix)

MSC Class: 68U35 ACM Class: H.5.0; I.2.m

arXiv:2404.14563 [pdf, other]

Exploring Algorithmic Explainability: Generating Explainable AI Insights for Personalized Clinical Decision Support Focused on Cannabis Intoxication in Young Adults

Authors: Tongze Zhang, Tammy Chung, Anind Dey, Sang Won Bae

Abstract: This study explores the possibility of facilitating algorithmic decision-making by combining interpretable artificial intelligence (XAI) techniques with sensor data, with the aim of providing researchers and clinicians with personalized analyses of cannabis intoxication behavior. SHAP analyzes the importance and quantifies the impact of specific factors such as environmental noise or heart rate, e… ▽ More This study explores the possibility of facilitating algorithmic decision-making by combining interpretable artificial intelligence (XAI) techniques with sensor data, with the aim of providing researchers and clinicians with personalized analyses of cannabis intoxication behavior. SHAP analyzes the importance and quantifies the impact of specific factors such as environmental noise or heart rate, enabling clinicians to pinpoint influential behaviors and environmental conditions. SkopeRules simplify the understanding of cannabis use for a specific activity or environmental use. Decision trees provide a clear visualization of how factors interact to influence cannabis consumption. Counterfactual models help identify key changes in behaviors or conditions that may alter cannabis use outcomes, to guide effective individualized intervention strategies. This multidimensional analytical approach not only unveils changes in behavioral and physiological states after cannabis use, such as frequent fluctuations in activity states, nontraditional sleep patterns, and specific use habits at different times and places, but also highlights the significance of individual differences in responses to cannabis use. These insights carry profound implications for clinicians seeking to gain a deeper understanding of the diverse needs of their patients and for tailoring precisely targeted intervention strategies. Furthermore, our findings highlight the pivotal role that XAI technologies could play in enhancing the transparency and interpretability of Clinical Decision Support Systems (CDSS), with a particular focus on substance misuse treatment. This research significantly contributes to ongoing initiatives aimed at advancing clinical practices that aim to prevent and reduce cannabis-related harms to health, positioning XAI as a supportive tool for clinicians and researchers alike. △ Less

Submitted 29 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 2024 International Conference on Activity and Behavior Computing

arXiv:2404.10702 [pdf, other]

Retrieval Augmented Verification: Unveiling Disinformation with Structured Representations for Zero-Shot Real-Time Evidence-guided Fact-Checking of Multi-modal Social media posts

Authors: Arka Ujjal Dey, Artemis Llabrés, Ernest Valveny, Dimosthenis Karatzas

Abstract: Social Media posts, where real images are unscrupulously reused along with provocative text to promote a particular idea, have been one of the major sources of disinformation. By design, these claims are without editorial oversight and accessible to a vast population who otherwise may not have access to multiple information sources. This implies the need to fact-check these posts and clearly expla… ▽ More Social Media posts, where real images are unscrupulously reused along with provocative text to promote a particular idea, have been one of the major sources of disinformation. By design, these claims are without editorial oversight and accessible to a vast population who otherwise may not have access to multiple information sources. This implies the need to fact-check these posts and clearly explain which parts of the posts are fake. In the supervised learning setup, this is often reduced to a binary classification problem, neglecting all intermediate stages. Further, these claims often involve recent events on which systems trained on historical data are prone to fail. In this work, we propose a zero-shot approach by retrieving real-time web-scraped evidence from multiple news websites and matching them with the claim text and image using pretrained language vision systems. We propose a graph structured representation, which a) allows us to gather evidence automatically and b) helps generate interpretable results by explicitly pointing out which parts of the claim can not be verified. Our zero-shot method, with improved interpretability, generates competitive results against the state-of-the-art methods △ Less

Submitted 29 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.06246 [pdf, other]

GHNeRF: Learning Generalizable Human Features with Efficient Neural Radiance Fields

Authors: Arnab Dey, Di Yang, Rohith Agaram, Antitza Dantcheva, Andrew I. Comport, Srinath Sridhar, Jean Martinet

Abstract: Recent advances in Neural Radiance Fields (NeRF) have demonstrated promising results in 3D scene representations, including 3D human representations. However, these representations often lack crucial information on the underlying human pose and structure, which is crucial for AR/VR applications and games. In this paper, we introduce a novel approach, termed GHNeRF, designed to address these limita… ▽ More Recent advances in Neural Radiance Fields (NeRF) have demonstrated promising results in 3D scene representations, including 3D human representations. However, these representations often lack crucial information on the underlying human pose and structure, which is crucial for AR/VR applications and games. In this paper, we introduce a novel approach, termed GHNeRF, designed to address these limitations by learning 2D/3D joint locations of human subjects with NeRF representation. GHNeRF uses a pre-trained 2D encoder streamlined to extract essential human features from 2D images, which are then incorporated into the NeRF framework in order to encode human biomechanic features. This allows our network to simultaneously learn biomechanic features, such as joint locations, along with human geometry and texture. To assess the effectiveness of our method, we conduct a comprehensive comparison with state-of-the-art human NeRF techniques and joint estimation algorithms. Our results show that GHNeRF can achieve state-of-the-art results in near real-time. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06152 [pdf, other]

HFNeRF: Learning Human Biomechanic Features with Neural Radiance Fields

Authors: Arnab Dey, Di Yang, Antitza Dantcheva, Jean Martinet

Abstract: In recent advancements in novel view synthesis, generalizable Neural Radiance Fields (NeRF) based methods applied to human subjects have shown remarkable results in generating novel views from few images. However, this generalization ability cannot capture the underlying structural features of the skeleton shared across all instances. Building upon this, we introduce HFNeRF: a novel generalizable… ▽ More In recent advancements in novel view synthesis, generalizable Neural Radiance Fields (NeRF) based methods applied to human subjects have shown remarkable results in generating novel views from few images. However, this generalization ability cannot capture the underlying structural features of the skeleton shared across all instances. Building upon this, we introduce HFNeRF: a novel generalizable human feature NeRF aimed at generating human biomechanic features using a pre-trained image encoder. While previous human NeRF methods have shown promising results in the generation of photorealistic virtual avatars, such methods lack underlying human structure or biomechanic features such as skeleton or joint information that are crucial for downstream applications including Augmented Reality (AR)/Virtual Reality (VR). HFNeRF leverages 2D pre-trained foundation models toward learning human features in 3D using neural rendering, and then volume rendering towards generating 2D feature maps. We evaluate HFNeRF in the skeleton estimation task by predicting heatmaps as features. The proposed method is fully differentiable, allowing to successfully learn color, geometry, and human skeleton in a simultaneous manner. This paper presents preliminary results of HFNeRF, illustrating its potential in generating realistic virtual avatars with biomechanic features using NeRF. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.01413 [pdf, other]

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

Authors: Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

Abstract: The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration… ▽ More The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration until fitted models become useless. However, those studies largely assumed that new data replace old data over time, where an arguably more realistic assumption is that data accumulate over time. In this paper, we ask: what effect does accumulating data have on model collapse? We empirically study this question by pretraining sequences of language models on text corpora. We confirm that replacing the original real data by each generation's synthetic data does indeed tend towards model collapse, then demonstrate that accumulating the successive generations of synthetic data alongside the original real data avoids model collapse; these results hold across a range of model sizes, architectures, and hyperparameters. We obtain similar results for deep generative models on other types of real data: diffusion models for molecule conformation generation and variational autoencoders for image generation. To understand why accumulating data can avoid model collapse, we use an analytically tractable framework introduced by prior work in which a sequence of linear models are fit to the previous models' outputs. Previous work used this framework to show that if data are replaced, the test error increases with the number of model-fitting iterations; we extend this argument to prove that if data instead accumulate, the test error has a finite upper bound independent of the number of iterations, meaning model collapse no longer occurs. △ Less

Submitted 29 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.05584 [pdf, other]

doi 10.1145/3613904.3642747

Time2Stop: Adaptive and Explainable Human-AI Loop for Smartphone Overuse Intervention

Authors: Adiba Orzikulova, Han Xiao, Zhipeng Li, Yukang Yan, Yuntao Wang, Yuanchun Shi, Marzyeh Ghassemi, Sung-Ju Lee, Anind K Dey, Xuhai "Orson" Xu

Abstract: Despite a rich history of investigating smartphone overuse intervention techniques, AI-based just-in-time adaptive intervention (JITAI) methods for overuse reduction are lacking. We develop Time2Stop, an intelligent, adaptive, and explainable JITAI system that leverages machine learning to identify optimal intervention timings, introduces interventions with transparent AI explanations, and collect… ▽ More Despite a rich history of investigating smartphone overuse intervention techniques, AI-based just-in-time adaptive intervention (JITAI) methods for overuse reduction are lacking. We develop Time2Stop, an intelligent, adaptive, and explainable JITAI system that leverages machine learning to identify optimal intervention timings, introduces interventions with transparent AI explanations, and collects user feedback to establish a human-AI loop and adapt the intervention model over time. We conducted an 8-week field experiment (N=71) to evaluate the effectiveness of both the adaptation and explanation aspects of Time2Stop. Our results indicate that our adaptive models significantly outperform the baseline methods on intervention accuracy (>32.8\% relatively) and receptivity (>8.0\%). In addition, incorporating explanations further enhances the effectiveness by 53.8\% and 11.4\% on accuracy and receptivity, respectively. Moreover, Time2Stop significantly reduces overuse, decreasing app visit frequency by 7.0$\sim$8.9\%. Our subjective data also echoed these quantitative measures. Participants preferred the adaptive interventions and rated the system highly on intervention time accuracy, effectiveness, and level of trust. We envision our work can inspire future research on JITAI systems with a human-AI loop to evolve with users. △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2401.08588 [pdf]

Improved Pothole Detection Using YOLOv7 and ESRGAN

Authors: Nirmal Kumar Rout, Gyanateet Dutta, Varun Sinha, Arghadeep Dey, Subhrangshu Mukherjee, Gopal Gupta

Abstract: Potholes are common road hazards that is causing damage to vehicles and posing a safety risk to drivers. The introduction of Convolutional Neural Networks (CNNs) is widely used in the industry for object detection based on Deep Learning methods and has achieved significant progress in hardware improvement and software implementations. In this paper, a unique better algorithm is proposed to warrant… ▽ More Potholes are common road hazards that is causing damage to vehicles and posing a safety risk to drivers. The introduction of Convolutional Neural Networks (CNNs) is widely used in the industry for object detection based on Deep Learning methods and has achieved significant progress in hardware improvement and software implementations. In this paper, a unique better algorithm is proposed to warrant the use of low-resolution cameras or low-resolution images and video feed for automatic pothole detection using Super Resolution (SR) through Super Resolution Generative Adversarial Networks (SRGANs). Then we have proceeded to establish a baseline pothole detection performance on low quality and high quality dashcam images using a You Only Look Once (YOLO) network, namely the YOLOv7 network. We then have illustrated and examined the speed and accuracy gained above the benchmark after having upscaling implementation on the low quality images. △ Less

Submitted 10 November, 2023; originally announced January 2024.

arXiv:2312.04223 [pdf, other]

doi 10.1145/3613905.3636286

PhysioCHI: Towards Best Practices for Integrating Physiological Signals in HCI

Authors: Francesco Chiossi, Ekaterina R. Stepanova, Benjamin Tag, Monica Perusquia-Hernandez, Alexandra Kitson, Arindam Dey, Sven Mayer, Abdallah El Ali

Abstract: Recently, we saw a trend toward using physiological signals in interactive systems. These signals, offering deep insights into users' internal states and health, herald a new era for HCI. However, as this is an interdisciplinary approach, many challenges arise for HCI researchers, such as merging diverse disciplines, from understanding physiological functions to design expertise. Also, isolated re… ▽ More Recently, we saw a trend toward using physiological signals in interactive systems. These signals, offering deep insights into users' internal states and health, herald a new era for HCI. However, as this is an interdisciplinary approach, many challenges arise for HCI researchers, such as merging diverse disciplines, from understanding physiological functions to design expertise. Also, isolated research endeavors limit the scope and reach of findings. This workshop aims to bridge these gaps, fostering cross-disciplinary discussions on usability, open science, and ethics tied to physiological data in HCI. In this workshop, we will discuss best practices for embedding physiological signals in interactive systems. Through collective efforts, we seek to craft a guiding document for best practices in physiological HCI research, ensuring that it remains grounded in shared principles and methodologies as the field advances. △ Less

Submitted 11 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

arXiv:2311.12161 [pdf, other]

ChemScraper: Leveraging PDF Graphics Instructions for Molecular Diagram Parsing

Authors: Ayush Kumar Shah, Bryan Manrique Amador, Abhisek Dey, Ming Creekmore, Blake Ocampo, Scott Denmark, Richard Zanibbi

Abstract: Most molecular diagram parsers recover chemical structure from raster images (e.g., PNGs). However, many PDFs include commands giving explicit locations and shapes for characters, lines, and polygons. We present a new parser that uses these born-digital PDF primitives as input. The parsing model is fast and accurate, and does not require GPUs, Optical Character Recognition (OCR), or vectorization.… ▽ More Most molecular diagram parsers recover chemical structure from raster images (e.g., PNGs). However, many PDFs include commands giving explicit locations and shapes for characters, lines, and polygons. We present a new parser that uses these born-digital PDF primitives as input. The parsing model is fast and accurate, and does not require GPUs, Optical Character Recognition (OCR), or vectorization. We use the parser to annotate raster images and then train a new multi-task neural network for recognizing molecules in raster images. We evaluate our parsers using SMILES and standard benchmarks, along with a novel evaluation protocol comparing molecular graphs directly that supports automatic error compilation and reveals errors missed by SMILES-based evaluation. On the synthetic USPTO benchmark, our born-digital parser obtains a recognition rate of 98.4% (1% higher than previous models) and our relatively simple neural parser for raster images obtains a rate of 85% using less training data than existing neural approaches (thousands vs. millions of molecules). △ Less

Submitted 31 May, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: 20 pages without references, 12 figures, 4 Tables, submitted to International Conference on Document Analysis and Recognition (ICDAR) - Journal Track

arXiv:2309.00417 [pdf, other]

Area-norm COBRA on Conditional Survival Prediction

Authors: Rahul Goswami, Arabin Kr. Dey

Abstract: The paper explores a different variation of combined regression strategy to calculate the conditional survival function. We use regression based weak learners to create the proposed ensemble technique. The proposed combined regression strategy uses proximity measure as area between two survival curves. The proposed model shows a construction which ensures that it performs better than the Random Su… ▽ More The paper explores a different variation of combined regression strategy to calculate the conditional survival function. We use regression based weak learners to create the proposed ensemble technique. The proposed combined regression strategy uses proximity measure as area between two survival curves. The proposed model shows a construction which ensures that it performs better than the Random Survival Forest. The paper discusses a novel technique to select the most important variable in the combined regression setup. We perform a simulation study to show that our proposition for finding relevance of the variables works quite well. We also use three real-life datasets to illustrate the model. △ Less

Submitted 9 September, 2023; v1 submitted 1 September, 2023; originally announced September 2023.

arXiv:2308.08710 [pdf, other]

doi 10.1145/3594739.3610677

A Framework for Designing Fair Ubiquitous Computing Systems

Authors: Han Zhang, Leijie Wang, Yilun Sheng, Xuhai Xu, Jennifer Mankoff, Anind K. Dey

Abstract: Over the past few decades, ubiquitous sensors and systems have been an integral part of humans' everyday life. They augment human capabilities and provide personalized experiences across diverse contexts such as healthcare, education, and transportation. However, the widespread adoption of ubiquitous computing has also brought forth concerns regarding fairness and equitable treatment. As these sys… ▽ More Over the past few decades, ubiquitous sensors and systems have been an integral part of humans' everyday life. They augment human capabilities and provide personalized experiences across diverse contexts such as healthcare, education, and transportation. However, the widespread adoption of ubiquitous computing has also brought forth concerns regarding fairness and equitable treatment. As these systems can make automated decisions that impact individuals, it is essential to ensure that they do not perpetuate biases or discriminate against specific groups. While fairness in ubiquitous computing has been an acknowledged concern since the 1990s, it remains understudied within the field. To bridge this gap, we propose a framework that incorporates fairness considerations into system design, including prioritizing stakeholder perspectives, inclusive data collection, fairness-aware algorithms, appropriate evaluation criteria, enhancing human engagement while addressing privacy concerns, and interactive improvement and regular monitoring. Our framework aims to guide the development of fair and unbiased ubiquitous computing systems, ensuring equal treatment and positive societal impact. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: 8 pages, 1 figure, published in 2023 ACM International Joint Conference on Pervasive and Ubiquitous Computing & the 2023 ACM International Symposium on Wearable Computing

MSC Class: 68U35 ACM Class: H.5.2; I.2.m

arXiv:2307.16897 [pdf, other]

DiVa-360: The Dynamic Visual Dataset for Immersive Neural Fields

Authors: Cheng-You Lu, Peisen Zhou, Angela Xing, Chandradeep Pokhariya, Arnab Dey, Ishaan Shah, Rugved Mavidipalli, Dylan Hu, Andrew Comport, Kefan Chen, Srinath Sridhar

Abstract: Advances in neural fields are enabling high-fidelity capture of the shape and appearance of dynamic 3D scenes. However, their capabilities lag behind those offered by conventional representations such as 2D videos because of algorithmic challenges and the lack of large-scale multi-view real-world datasets. We address the dataset limitation with DiVa-360, a real-world 360 dynamic visual dataset tha… ▽ More Advances in neural fields are enabling high-fidelity capture of the shape and appearance of dynamic 3D scenes. However, their capabilities lag behind those offered by conventional representations such as 2D videos because of algorithmic challenges and the lack of large-scale multi-view real-world datasets. We address the dataset limitation with DiVa-360, a real-world 360 dynamic visual dataset that contains synchronized high-resolution and long-duration multi-view video sequences of table-scale scenes captured using a customized low-cost system with 53 cameras. It contains 21 object-centric sequences categorized by different motion types, 25 intricate hand-object interaction sequences, and 8 long-duration sequences for a total of 17.4 M image frames. In addition, we provide foreground-background segmentation masks, synchronized audio, and text descriptions. We benchmark the state-of-the-art dynamic neural field methods on DiVa-360 and provide insights about existing methods and future challenges on long-duration neural field capture. △ Less

Submitted 26 March, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

arXiv:2307.14385 [pdf, other]

doi 10.1145/3643540

Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data

Authors: Xuhai Xu, Bingsheng Yao, Yuanzhe Dong, Saadia Gabriel, Hong Yu, James Hendler, Marzyeh Ghassemi, Anind K. Dey, Dakuo Wang

Abstract: Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA… ▽ More Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4. We conduct a broad range of experiments, covering zero-shot prompting, few-shot prompting, and instruction fine-tuning. The results indicate a promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 (25 and 15 times bigger) by 10.9% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8%. They further perform on par with the state-of-the-art task-specific language model. We also conduct an exploratory case study on LLMs' capability on mental health reasoning tasks, illustrating the promising capability of certain models such as GPT-4. We summarize our findings into a set of action guidelines for potential methods to enhance LLMs' capability for mental health tasks. Meanwhile, we also emphasize the important limitations before achieving deployability in real-world mental health settings, such as known racial and gender bias. We highlight the important ethical risks accompanying this line of research. △ Less

Submitted 28 January, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

Comments: Published at Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) 2024

MSC Class: 68U35 ACM Class: H.5.2; I.2.m

arXiv:2306.03381 [pdf, other]

VR.net: A Real-world Dataset for Virtual Reality Motion Sickness Research

Authors: Elliott Wen, Chitralekha Gupta, Prasanth Sasikumar, Mark Billinghurst, James Wilmott, Emily Skow, Arindam Dey, Suranga Nanayakkara

Abstract: Researchers have used machine learning approaches to identify motion sickness in VR experience. These approaches demand an accurately-labeled, real-world, and diverse dataset for high accuracy and generalizability. As a starting point to address this need, we introduce `VR.net', a dataset offering approximately 12-hour gameplay videos from ten real-world games in 10 diverse genres. For each video… ▽ More Researchers have used machine learning approaches to identify motion sickness in VR experience. These approaches demand an accurately-labeled, real-world, and diverse dataset for high accuracy and generalizability. As a starting point to address this need, we introduce `VR.net', a dataset offering approximately 12-hour gameplay videos from ten real-world games in 10 diverse genres. For each video frame, a rich set of motion sickness-related labels, such as camera/object movement, depth field, and motion flow, are accurately assigned. Building such a dataset is challenging since manual labeling would require an infeasible amount of time. Instead, we utilize a tool to automatically and precisely extract ground truth data from 3D engines' rendering pipelines without accessing VR games' source code. We illustrate the utility of VR.net through several applications, such as risk factor detection and sickness level prediction. We continuously expand VR.net and envision its next version offering 10X more data than the current form. We believe that the scale, accuracy, and diversity of VR.net can offer unparalleled opportunities for VR motion sickness research and beyond. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2306.02193 [pdf, other]

LDEB -- Label Digitization with Emotion Binarization and Machine Learning for Emotion Recognition in Conversational Dialogues

Authors: Amitabha Dey, Shan Suthaharan

Abstract: Emotion recognition in conversations (ERC) is vital to the advancements of conversational AI and its applications. Therefore, the development of an automated ERC model using the concepts of machine learning (ML) would be beneficial. However, the conversational dialogues present a unique problem where each dialogue depicts nested emotions that entangle the association between the emotional feature… ▽ More Emotion recognition in conversations (ERC) is vital to the advancements of conversational AI and its applications. Therefore, the development of an automated ERC model using the concepts of machine learning (ML) would be beneficial. However, the conversational dialogues present a unique problem where each dialogue depicts nested emotions that entangle the association between the emotional feature descriptors and emotion type (or label). This entanglement that can be multiplied with the presence of data paucity is an obstacle for a ML model. To overcome this problem, we proposed a novel approach called Label Digitization with Emotion Binarization (LDEB) that disentangles the twists by utilizing the text normalization and 7-bit digital encoding techniques and constructs a meaningful feature space for a ML model to be trained. We also utilized the publicly available dataset called the FETA-DailyDialog dataset for feature learning and developed a hierarchical ERC model using random forest (RF) and artificial neural network (ANN) classifiers. Simulations showed that the ANN-based ERC model was able to predict emotion with the best accuracy and precision scores of about 74% and 76%, respectively. Simulations also showed that the ANN-model could reach a training accuracy score of about 98% with 60 epochs. On the other hand, the RF-based ERC model was able to predict emotions with the best accuracy and precision scores of about 78% and 75%, respectively. △ Less

Submitted 3 June, 2023; originally announced June 2023.

Comments: 10 pages, 3 figures, 4 tables

arXiv:2303.13974

Mixed-Type Wafer Classification For Low Memory Devices Using Knowledge Distillation

Authors: Nitish Shukla, Anurima Dey, Srivatsan K

Abstract: Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial for determining the root cause of production defects, which may further provide insight for yield improvement in wafer foundry. During manufacturing, various defects may appear standalone in the wafer or may appear as different combinations. Identifying multiple defects… ▽ More Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial for determining the root cause of production defects, which may further provide insight for yield improvement in wafer foundry. During manufacturing, various defects may appear standalone in the wafer or may appear as different combinations. Identifying multiple defects in a wafer is generally harder compared to identifying a single defect. Recently, deep learning methods have gained significant traction in mixed-type DPR. However, the complexity of defects requires complex and large models making them very difficult to operate on low-memory embedded devices typically used in fabrication labs. Another common issue is the unavailability of labeled data to train complex networks. In this work, we propose an unsupervised training routine to distill the knowledge of complex pre-trained models to lightweight deployment-ready models. We empirically show that this type of training compresses the model without sacrificing accuracy despite being up to 10 times smaller than the teacher model. The compressed model also manages to outperform contemporary state-of-the-art models. △ Less

Submitted 18 October, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

Comments: Study is not relevant

arXiv:2302.10859 [pdf, other]

doi 10.1016/j.compmedimag.2023.102279

SF2Former: Amyotrophic Lateral Sclerosis Identification From Multi-center MRI Data Using Spatial and Frequency Fusion Transformer

Authors: Rafsanjany Kushol, Collin C. Luk, Avyarthana Dey, Michael Benatar, Hannah Briemberg, Annie Dionne, Nicolas Dupré, Richard Frayne, Angela Genge, Summer Gibson, Simon J. Graham, Lawrence Korngut, Peter Seres, Robert C. Welsh, Alan Wilman, Lorne Zinman, Sanjay Kalra, Yee-Hong Yang

Abstract: Amyotrophic Lateral Sclerosis (ALS) is a complex neurodegenerative disorder involving motor neuron degeneration. Significant research has begun to establish brain magnetic resonance imaging (MRI) as a potential biomarker to diagnose and monitor the state of the disease. Deep learning has turned into a prominent class of machine learning programs in computer vision and has been successfully employe… ▽ More Amyotrophic Lateral Sclerosis (ALS) is a complex neurodegenerative disorder involving motor neuron degeneration. Significant research has begun to establish brain magnetic resonance imaging (MRI) as a potential biomarker to diagnose and monitor the state of the disease. Deep learning has turned into a prominent class of machine learning programs in computer vision and has been successfully employed to solve diverse medical image analysis tasks. However, deep learning-based methods applied to neuroimaging have not achieved superior performance in ALS patients classification from healthy controls due to having insignificant structural changes correlated with pathological features. Therefore, the critical challenge in deep models is to determine useful discriminative features with limited training data. By exploiting the long-range relationship of image features, this study introduces a framework named SF2Former that leverages vision transformer architecture's power to distinguish the ALS subjects from the control group. To further improve the network's performance, spatial and frequency domain information are combined because MRI scans are captured in the frequency domain before being converted to the spatial domain. The proposed framework is trained with a set of consecutive coronal 2D slices, which uses the pre-trained weights on ImageNet by leveraging transfer learning. Finally, a majority voting scheme has been employed to those coronal slices of a particular subject to produce the final classification decision. Our proposed architecture has been thoroughly assessed with multi-modal neuroimaging data using two well-organized versions of the Canadian ALS Neuroimaging Consortium (CALSNIC) multi-center datasets. The experimental results demonstrate the superiority of our proposed strategy in terms of classification accuracy compared with several popular deep learning-based techniques. △ Less

Submitted 28 February, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: 17 pages, 8 figures

Journal ref: Computerized Medical Imaging and Graphics Volume 108, September 2023, 102279

arXiv:2211.13915 [pdf, ps, other]

Confidence Interval Construction for Multivariate time series using Long Short Term Memory Network

Authors: Aryan Bhambu, Arabin Kumar Dey

Abstract: In this paper we propose a novel procedure to construct a confidence interval for multivariate time series predictions using long short term memory network. The construction uses a few novel block bootstrap techniques. We also propose an innovative block length selection procedure for each of these schemes. Two novel benchmarks help us to compare the construction of this confidence intervals by di… ▽ More In this paper we propose a novel procedure to construct a confidence interval for multivariate time series predictions using long short term memory network. The construction uses a few novel block bootstrap techniques. We also propose an innovative block length selection procedure for each of these schemes. Two novel benchmarks help us to compare the construction of this confidence intervals by different bootstrap techniques. We illustrate the whole construction through S\&P $500$ and Dow Jones Index datasets. △ Less

Submitted 25 November, 2022; originally announced November 2022.

arXiv:2211.06581 [pdf, other]

Variational Augmentation for Enhancing Historical Document Image Binarization

Authors: Avirup Dey, Nibaran Das, Mita Nasipuri

Abstract: Historical Document Image Binarization is a well-known segmentation problem in image processing. Despite ubiquity, traditional thresholding algorithms achieved limited success on severely degraded document images. With the advent of deep learning, several segmentation models were proposed that made significant progress in the field but were limited by the unavailability of large training datasets.… ▽ More Historical Document Image Binarization is a well-known segmentation problem in image processing. Despite ubiquity, traditional thresholding algorithms achieved limited success on severely degraded document images. With the advent of deep learning, several segmentation models were proposed that made significant progress in the field but were limited by the unavailability of large training datasets. To mitigate this problem, we have proposed a novel two-stage framework -- the first of which comprises a generator that generates degraded samples using variational inference and the second being a CNN-based binarization network that trains on the generated data. We evaluated our framework on a range of DIBCO datasets, where it achieved competitive results against previous state-of-the-art methods. △ Less

Submitted 12 November, 2022; originally announced November 2022.

Comments: Accepted at ICVGIP 2022

MSC Class: I.4.6

arXiv:2211.02733 [pdf, other]

GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization

Authors: Xuhai Xu, Han Zhang, Yasaman Sefidgar, Yiyi Ren, Xin Liu, Woosuk Seo, Jennifer Brown, Kevin Kuehn, Mike Merrill, Paula Nurius, Shwetak Patel, Tim Althoff, Margaret E. Morris, Eve Riskin, Jennifer Mankoff, Anind K. Dey

Abstract: Recent research has demonstrated the capability of behavior signals captured by smartphones and wearables for longitudinal behavior modeling. However, there is a lack of a comprehensive public dataset that serves as an open testbed for fair comparison among algorithms. Moreover, prior studies mainly evaluate algorithms using data from a single population within a short period, without measuring th… ▽ More Recent research has demonstrated the capability of behavior signals captured by smartphones and wearables for longitudinal behavior modeling. However, there is a lack of a comprehensive public dataset that serves as an open testbed for fair comparison among algorithms. Moreover, prior studies mainly evaluate algorithms using data from a single population within a short period, without measuring the cross-dataset generalizability of these algorithms. We present the first multi-year passive sensing datasets, containing over 700 user-years and 497 unique users' data collected from mobile and wearable sensors, together with a wide range of well-being metrics. Our datasets can support multiple cross-dataset evaluations of behavior modeling algorithms' generalizability across different users and years. As a starting point, we provide the benchmark results of 18 algorithms on the task of depression detection. Our results indicate that both prior depression detection algorithms and domain generalization techniques show potential but need further research to achieve adequate cross-dataset generalizability. We envision our multi-year datasets can support the ML community in developing generalizable longitudinal behavior modeling algorithms. △ Less

Submitted 4 March, 2023; v1 submitted 4 November, 2022; originally announced November 2022.

Comments: Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track

MSC Class: 68T09 ACM Class: I.2.1; E.m

arXiv:2210.12006 [pdf, ps, other]

Integrated Brier Score based Survival Cobra -- A regression based approach

Authors: Rahul Goswami, Arabin Kumar Dey

Abstract: Recently Goswami et al. \cite{goswami2022concordance} introduced two novel implementations of combined regression strategy to find the conditional survival function. The paper uses regression-based weak learners and provides an alternative version of the combined regression strategy (COBRA) ensemble using the Integrated Brier Score to predict conditional survival function. We create a novel predic… ▽ More Recently Goswami et al. \cite{goswami2022concordance} introduced two novel implementations of combined regression strategy to find the conditional survival function. The paper uses regression-based weak learners and provides an alternative version of the combined regression strategy (COBRA) ensemble using the Integrated Brier Score to predict conditional survival function. We create a novel predictor based on a weighted version of all machine predictions taking weights as a specific function of normalized Integrated Brier Score. We use two different norms (Frobenius and Sup norm) to extract the proximity points in the algorithm. Our implementations consider right-censored data too. We illustrate the proposed algorithms through some real-life data analysis. △ Less

Submitted 27 October, 2022; v1 submitted 21 October, 2022; originally announced October 2022.

Comments: arXiv admin note: text overlap with arXiv:2209.11919

arXiv:2210.10655 [pdf, ps, other]

Controlling Travel Path of Original Cobra

Authors: Mriganka Basu RoyChowdhury, Arabin K Dey

Abstract: In this paper we propose a kernel based COBRA which is a direct approximation of the original COBRA. We propose a novel tuning procedure for original COBRA parameters based on this kernel approximation. We show that our proposed algorithm provides much better accuracy than other COBRAs and faster than usual Gridsearch COBRA. We use two datasets to illustrate our proposed methodology over existing… ▽ More In this paper we propose a kernel based COBRA which is a direct approximation of the original COBRA. We propose a novel tuning procedure for original COBRA parameters based on this kernel approximation. We show that our proposed algorithm provides much better accuracy than other COBRAs and faster than usual Gridsearch COBRA. We use two datasets to illustrate our proposed methodology over existing COBRAs. △ Less

Submitted 15 October, 2022; originally announced October 2022.

Comments: 9 pages; 2 figures

arXiv:2210.02102

An Architectural Approach to Creating a Cloud Application for Developing Microservices

Authors: A. N. M. Sajedul Alam, Junaid Bin Kibria, Al Hasib Mahamud, Arnob Kumar Dey, Hasan Muhammed Zahidul Amin, Md Sabbir Hossain, Annajiat Alim Rasel

Abstract: The cloud is a new paradigm that is paving the way for new approaches and standards. The architectural styles are evolving in response to the cloud's requirements. In recent years, microservices have emerged as the preferred architectural style for scalable, rapidly evolving cloud applications. The adoption of microservices to the detriment of monolithic structures, which are increasingly being ph… ▽ More The cloud is a new paradigm that is paving the way for new approaches and standards. The architectural styles are evolving in response to the cloud's requirements. In recent years, microservices have emerged as the preferred architectural style for scalable, rapidly evolving cloud applications. The adoption of microservices to the detriment of monolithic structures, which are increasingly being phased out, is one of the most significant developments in business architecture. Cloud-native architectures make microservices system deployment more productive, adaptable, and cost-effective. Regardless, many firms have begun to transition from one type of architecture to another, though this is still in its early stages. The primary purpose of this article is to gain a better understanding of how to design microservices through developing cloud apps, as well as current microservices trends, the reason for microservices research, emerging standards, and prospective research gaps. Researchers and practitioners in software engineering can use the data to stay current on SOA and cloud computing developments. △ Less

Submitted 7 October, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: It is not completed properly yet, I want to withdraw it as an author

arXiv:2209.15293 [pdf]

A Survey: Credit Sentiment Score Prediction

Authors: A. N. M. Sajedul Alam, Junaid Bin Kibria, Arnob Kumar Dey, Zawad Alam, Shifat Zaman, Motahar Mahtab, Mohammed Julfikar Ali Mahbub, Annajiat Alim Rasel

Abstract: Manual approvals are still used by banks and other NGOs to approve loans. It takes time and is prone to mistakes because it is controlled by a bank employee. Several fields of machine learning mining technologies have been utilized to enhance various areas of credit rating forecast. A major goal of this research is to look at current sentiment analysis techniques that are being used to generate cr… ▽ More Manual approvals are still used by banks and other NGOs to approve loans. It takes time and is prone to mistakes because it is controlled by a bank employee. Several fields of machine learning mining technologies have been utilized to enhance various areas of credit rating forecast. A major goal of this research is to look at current sentiment analysis techniques that are being used to generate creditworthiness. △ Less

Submitted 30 September, 2022; originally announced September 2022.

Comments: 16 pages, 3 figures, 3 tables

arXiv:2209.15288 [pdf]

A Survey: Implementations of Non-fungible Token System in Different Fields

Authors: A. N. M. Sajedul Alam, Junaid Bin Kibria, Al Hasib Mahamud, Arnob Kumar Dey, Hasan Muhammed Zahidul Amin, Md Sabbir Hossain, Annajiat Alim Rasel

Abstract: In the realm of digital art and collectibles, NFTs are sweeping the board. Because of the massive sales to a new crypto audience, the livelihoods of digital artists are being transformed. It is no surprise that celebs are jumping on the bandwagon. It is a fact that NFTs can be used in multiple ways, including digital artwork such as animation, character design, digital painting, collection of self… ▽ More In the realm of digital art and collectibles, NFTs are sweeping the board. Because of the massive sales to a new crypto audience, the livelihoods of digital artists are being transformed. It is no surprise that celebs are jumping on the bandwagon. It is a fact that NFTs can be used in multiple ways, including digital artwork such as animation, character design, digital painting, collection of selfies or vlogs, and many more digital entities. As a result, they may be used to signify the possession of any specific object, whether it be digital or physical. NFTs are digital tokens that may be used to indicate ownership of one of a-kind goods. For example, I can buy a shoe or T shirt from any store, and then if the store provides me the same 3D model of that T-Shirt or shoe of the exact same design and color, it would be more connected with my feelings. They enable us to tokenize items such as artwork, valuables, and even real estate. NFTs can only be owned by one person at a time, and they are protected by the Ethereum blockchain no one can alter the ownership record or create a new NFT. The word non-fungible can be used to describe items like your furniture, a song file, or your computer. It is impossible to substitute these goods with anything else because they each have their own distinct characteristics. The goal was to find all the existing implementations of Non-fungible Tokens in different fields of recent technology, so that an overall overview of future implementations of NFT can be found and how it can be used to enrich user experiences. △ Less

Submitted 30 September, 2022; originally announced September 2022.

Comments: 14 pages, 3 figures, 3 tables

arXiv:2209.13836 [pdf, other]

Mutual Information Assisted Ensemble Recommender System for Identifying Critical Risk Factors in Healthcare Prognosis

Authors: Abhishek Dey, Debayan Goswami, Rahul Roy, Susmita Ghosh, Yu Shrike Zhang, Jonathan H. Chan

Abstract: Purpose: Health recommenders act as important decision support systems, aiding patients and medical professionals in taking actions that lead to patients' well-being. These systems extract the information which may be of particular relevance to the end-user, helping them in making appropriate decisions. The present study proposes a feature recommender, as a part of a disease management system, tha… ▽ More Purpose: Health recommenders act as important decision support systems, aiding patients and medical professionals in taking actions that lead to patients' well-being. These systems extract the information which may be of particular relevance to the end-user, helping them in making appropriate decisions. The present study proposes a feature recommender, as a part of a disease management system, that identifies and recommends the most important risk factors for an illness. Methods: A novel mutual information and ensemble-based feature ranking approach for identifying critical risk factors in healthcare prognosis is proposed. Results: To establish the effectiveness of the proposed method, experiments have been conducted on four benchmark datasets of diverse diseases (clear cell renal cell carcinoma (ccRCC), chronic kidney disease, Indian liver patient, and cervical cancer risk factors). The performance of the proposed recommender is compared with four state-of-the-art methods using recommender systems' performance metrics like average precision@K, precision@K, recall@K, F1@K, reciprocal rank@K. The method is able to recommend all relevant critical risk factors for ccRCC. It also attains a higher accuracy (96.6% and 98.6% using support vector machine and neural network, respectively) for ccRCC staging with a reduced feature set as compared to existing methods. Moreover, the top two features recommended using the proposed method with ccRCC, viz. size of tumor and metastasis status, are medically validated from the existing TNM system. Results are also found to be superior for the other three datasets. Conclusion: The proposed recommender can identify and recommend risk factors that have the most discriminating power for detecting diseases. △ Less

Submitted 1 July, 2024; v1 submitted 28 September, 2022; originally announced September 2022.

arXiv:2209.11919 [pdf, ps, other]

Concordance based Survival Cobra with regression type weak learners

Authors: Rahul Goswami, Arabin Kumar Dey

Abstract: In this paper, we predict conditional survival functions through a combined regression strategy. We take weak learners as different random survival trees. We propose to maximize concordance in the right-censored set up to find the optimal parameters. We explore two approaches, a usual survival cobra and a novel weighted predictor based on the concordance index. Our proposed formulations use two di… ▽ More In this paper, we predict conditional survival functions through a combined regression strategy. We take weak learners as different random survival trees. We propose to maximize concordance in the right-censored set up to find the optimal parameters. We explore two approaches, a usual survival cobra and a novel weighted predictor based on the concordance index. Our proposed formulations use two different norms, say, Max-norm and Frobenius norm, to find a proximity set of predictions from query points in the test dataset. We illustrate our algorithms through three different real-life dataset implementations. △ Less

Submitted 11 October, 2022; v1 submitted 24 September, 2022; originally announced September 2022.

arXiv:2209.11677 [pdf, other]

PNeRF: Probabilistic Neural Scene Representations for Uncertain 3D Visual Mapping

Authors: Yassine Ahmine, Arnab Dey, Andrew I. Comport

Abstract: Recently neural scene representations have provided very impressive results for representing 3D scenes visually, however, their study and progress have mainly been limited to visualization of virtual models in computer graphics or scene reconstruction in computer vision without explicitly accounting for sensor and pose uncertainty. Using this novel scene representation in robotics applications, ho… ▽ More Recently neural scene representations have provided very impressive results for representing 3D scenes visually, however, their study and progress have mainly been limited to visualization of virtual models in computer graphics or scene reconstruction in computer vision without explicitly accounting for sensor and pose uncertainty. Using this novel scene representation in robotics applications, however, would require accounting for this uncertainty in the neural map. The aim of this paper is therefore to propose a novel method for training {\em probabilistic neural scene representations} with uncertain training data that could enable the inclusion of these representations in robotics applications. Acquiring images using cameras or depth sensors contains inherent uncertainty, and furthermore, the camera poses used for learning a 3D model are also imperfect. If these measurements are used for training without accounting for their uncertainty, then the resulting models are non-optimal, and the resulting scene representations are likely to contain artifacts such as blur and un-even geometry. In this work, the problem of uncertainty integration to the learning process is investigated by focusing on training with uncertain information in a probabilistic manner. The proposed method involves explicitly augmenting the training likelihood with an uncertainty term such that the learnt probability distribution of the network is minimized with respect to the training uncertainty. It will be shown that this leads to more accurate image rendering quality, in addition to more precise and consistent geometry. Validation has been carried out on both synthetic and real datasets showing that the proposed approach outperforms state-of-the-art methods. The results show notably that the proposed method is capable of rendering novel high-quality views even when the training data is limited. △ Less

Submitted 23 September, 2022; originally announced September 2022.

Comments: 7 Pages, 6 Figures, 5 Tables. Submitted to IEEE International Conference on Robotics and Automation 2023 (ICRA 2023)

arXiv:2206.09074 [pdf, other]

Weakly Supervised Classification of Vital Sign Alerts as Real or Artifact

Authors: Arnab Dey, Mononito Goswami, Joo Heung Yoon, Gilles Clermont, Michael Pinsky, Marilyn Hravnak, Artur Dubrawski

Abstract: A significant proportion of clinical physiologic monitoring alarms are false. This often leads to alarm fatigue in clinical personnel, inevitably compromising patient safety. To combat this issue, researchers have attempted to build Machine Learning (ML) models capable of accurately adjudicating Vital Sign (VS) alerts raised at the bedside of hemodynamically monitored patients as real or artifact.… ▽ More A significant proportion of clinical physiologic monitoring alarms are false. This often leads to alarm fatigue in clinical personnel, inevitably compromising patient safety. To combat this issue, researchers have attempted to build Machine Learning (ML) models capable of accurately adjudicating Vital Sign (VS) alerts raised at the bedside of hemodynamically monitored patients as real or artifact. Previous studies have utilized supervised ML techniques that require substantial amounts of hand-labeled data. However, manually harvesting such data can be costly, time-consuming, and mundane, and is a key factor limiting the widespread adoption of ML in healthcare (HC). Instead, we explore the use of multiple, individually imperfect heuristics to automatically assign probabilistic labels to unlabeled training data using weak supervision. Our weakly supervised models perform competitively with traditional supervised techniques and require less involvement from domain experts, demonstrating their use as efficient and practical alternatives to supervised learning in HC applications of ML. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: Accepted at American Medical Informatics Association (AMIA) Annual Symposium 2022. 10 pages, 4 figures and 2 tables

arXiv:2206.00597 [pdf, other]

Post-Disaster Repair Crew Assignment Optimization Using Minimum Latency

Authors: Anakin Dey, Melkior Ornik

Abstract: Across infrastructure domains, physical damage caused by storms and other weather events often requires costly and time-sensitive repairs to restore services as quickly as possible. While recent studies have used agent-based models to estimate the cost of repairs, the implemented strategies for assignment of repair crews to different locations are generally human-driven or based on simple rules. I… ▽ More Across infrastructure domains, physical damage caused by storms and other weather events often requires costly and time-sensitive repairs to restore services as quickly as possible. While recent studies have used agent-based models to estimate the cost of repairs, the implemented strategies for assignment of repair crews to different locations are generally human-driven or based on simple rules. In order to find performant strategies, we continue with an agent-based model, but approach this problem as a combinational optimization problem known as the Minimum Weighted Latency Problem for multiple repair crews. We apply a partitioning algorithm that balances the assignment of targets amongst all the crews using two different heuristics that optimize either the importance of repair locations or the travel time between them. We benchmark our algorithm on both randomly generated graphs as well as data derived from a real-world urban environment, and show that our algorithm delivers significantly better assignments than existing methods. △ Less

Submitted 7 August, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: 7 pages, 5 figures

arXiv:2205.09351 [pdf, other]

doi 10.24132/JWSCG.2022.5

Mip-NeRF RGB-D: Depth Assisted Fast Neural Radiance Fields

Authors: Arnab Dey, Yassine Ahmine, Andrew I. Comport

Abstract: Neural scene representations, such as Neural Radiance Fields (NeRF), are based on training a multilayer perceptron (MLP) using a set of color images with known poses. An increasing number of devices now produce RGB-D(color + depth) information, which has been shown to be very important for a wide range of tasks. Therefore, the aim of this paper is to investigate what improvements can be made to th… ▽ More Neural scene representations, such as Neural Radiance Fields (NeRF), are based on training a multilayer perceptron (MLP) using a set of color images with known poses. An increasing number of devices now produce RGB-D(color + depth) information, which has been shown to be very important for a wide range of tasks. Therefore, the aim of this paper is to investigate what improvements can be made to these promising implicit representations by incorporating depth information with the color images. In particular, the recently proposed Mip-NeRF approach, which uses conical frustums instead of rays for volume rendering, allows one to account for the varying area of a pixel with distance from the camera center. The proposed method additionally models depth uncertainty. This allows to address major limitations of NeRF-based approaches including improving the accuracy of geometry, reduced artifacts, faster training time, and shortened prediction time. Experiments are performed on well-known benchmark scenes, and comparisons show improved accuracy in scene geometry and photometric reconstruction, while reducing the training time by 3 - 5 times. △ Less

Submitted 7 November, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

Report number: Vol.30., No.1-2, ISSN 1213-6972

Journal ref: Journal of WSCG 2022

arXiv:2203.15587 [pdf, other]

RGB-D Neural Radiance Fields: Local Sampling for Faster Training

Authors: Arnab Dey, Andrew I. Comport

Abstract: Learning a 3D representation of a scene has been a challenging problem for decades in computer vision. Recent advances in implicit neural representation from images using neural radiance fields(NeRF) have shown promising results. Some of the limitations of previous NeRF based methods include longer training time, and inaccurate underlying geometry. The proposed method takes advantage of RGB-D data… ▽ More Learning a 3D representation of a scene has been a challenging problem for decades in computer vision. Recent advances in implicit neural representation from images using neural radiance fields(NeRF) have shown promising results. Some of the limitations of previous NeRF based methods include longer training time, and inaccurate underlying geometry. The proposed method takes advantage of RGB-D data to reduce training time by leveraging depth sensing to improve local sampling. This paper proposes a depth-guided local sampling strategy and a smaller neural network architecture to achieve faster training time without compromising quality. △ Less

Submitted 7 November, 2022; v1 submitted 26 March, 2022; originally announced March 2022.

arXiv:2202.00596 [pdf, other]

Machine learning based modelling and optimization in hard turning of AISI D6 steel with newly developed AlTiSiN coated carbide tool

Authors: A Das, S R Das, J P Panda, A Dey, K K Gajrani, N Somani, N Gupta

Abstract: In recent times Mechanical and Production industries are facing increasing challenges related to the shift toward sustainable manufacturing. In this article, machining was performed in dry cutting condition with a newly developed coated insert called AlTiSiN coated carbides coated through scalable pulsed power plasma technique in dry cutting condition and a dataset was generated for different mach… ▽ More In recent times Mechanical and Production industries are facing increasing challenges related to the shift toward sustainable manufacturing. In this article, machining was performed in dry cutting condition with a newly developed coated insert called AlTiSiN coated carbides coated through scalable pulsed power plasma technique in dry cutting condition and a dataset was generated for different machining parameters and output responses. The machining parameters are speed, feed, depth of cut and the output responses are surface roughness, cutting force, crater wear length, crater wear width, and flank wear. The data collected from the machining operation is used for the development of machine learning (ML) based surrogate models to test, evaluate and optimize various input machining parameters. Different ML approaches such as polynomial regression (PR), random forest (RF) regression, gradient boosted (GB) trees, and adaptive boosting (AB) based regression are used to model different output responses in the hard machining of AISI D6 steel. The surrogate models for different output responses are used to prepare a complex objective function for the germinal center algorithm-based optimization of the machining parameters of the hard turning operation. △ Less

Submitted 30 January, 2022; originally announced February 2022.

arXiv:2201.03074 [pdf, other]

A Survey of Passive Sensing in the Workplace

Authors: Subigya Nepal, Gonzalo J. Martinez, Arvind Pillai, Koustuv Saha, Shayan Mirjafari, Vedant Das Swain, Xuhai Xu, Pino G. Audia, Munmun De Choudhury, Anind K. Dey, Aaron Striegel, Andrew T. Campbell

Abstract: As emerging technologies increasingly integrate into all facets of our lives, the workplace stands at the forefront of potential transformative changes. A notable development in this realm is the advent of passive sensing technology, designed to enhance both cognitive and physical capabilities by monitoring human behavior. This paper reviews current research on the application of passive sensing t… ▽ More As emerging technologies increasingly integrate into all facets of our lives, the workplace stands at the forefront of potential transformative changes. A notable development in this realm is the advent of passive sensing technology, designed to enhance both cognitive and physical capabilities by monitoring human behavior. This paper reviews current research on the application of passive sensing technology in the workplace, focusing on its impact on employee wellbeing and productivity. Additionally, we explore unresolved issues and outline prospective pathways for the incorporation of passive sensing in future workplaces. △ Less

Submitted 30 March, 2024; v1 submitted 9 January, 2022; originally announced January 2022.

Comments: Added references and other minor revisions. Also udated to include relevant works published after 2022

ACM Class: H.5.0

arXiv:2111.14762 [pdf, other]

Riemannian Functional Map Synchronization for Probabilistic Partial Correspondence in Shape Networks

Authors: Faria Huq, Adrish Dey, Sahra Yusuf, Dena Bazazian, Tolga Birdal, Nina Miolane

Abstract: We consider the problem of graph-matching on a network of 3D shapes with uncertainty quantification. We assume that the pairwise shape correspondences are efficiently represented as \emph{functional maps}, that match real-valued functions defined over pairs of shapes. By modeling functional maps between nearly isometric shapes as elements of the Lie group $SO(n)$, we employ \emph{synchronization}… ▽ More We consider the problem of graph-matching on a network of 3D shapes with uncertainty quantification. We assume that the pairwise shape correspondences are efficiently represented as \emph{functional maps}, that match real-valued functions defined over pairs of shapes. By modeling functional maps between nearly isometric shapes as elements of the Lie group $SO(n)$, we employ \emph{synchronization} to enforce cycle consistency of the collection of functional maps over the graph, hereby enhancing the accuracy of the individual maps. We further introduce a tempered Bayesian probabilistic inference framework on $SO(n)$. Our framework enables: (i) synchronization of functional maps as maximum-a-posteriori estimation on the Riemannian manifold of functional maps, (ii) sampling the solution space in our energy based model so as to quantify uncertainty in the synchronization problem. We dub the latter \emph{Riemannian Langevin Functional Map (RLFM) Sampler}. Our experiments demonstrate that constraining the synchronization on the Riemannian manifold $SO(n)$ improves the estimation of the functional maps, while our RLFM sampler provides for the first time an uncertainty quantification of the results. △ Less

Submitted 3 January, 2023; v1 submitted 29 November, 2021; originally announced November 2021.

Comments: 16 pages

arXiv:2111.13266 [pdf, other]

Examining Needs and Opportunities for Supporting Students Who Experience Discrimination

Authors: Yasaman S. Sefidgar, Paula S. Nurius, Amanda Baughan, Lisa A. Elkin, Anind K. Dey, Eve Riskin, Jennifer Mankoff, Margaret E. Morris

Abstract: Perceived discrimination is common and consequential. Yet, little support is available to ease handling of these experiences. Addressing this gap, we report on a need-finding study to guide us in identifying relevant technologies and their requirements. Specifically, we examined unfolding experiences of perceived discrimination among college students and found factors to address in providing meani… ▽ More Perceived discrimination is common and consequential. Yet, little support is available to ease handling of these experiences. Addressing this gap, we report on a need-finding study to guide us in identifying relevant technologies and their requirements. Specifically, we examined unfolding experiences of perceived discrimination among college students and found factors to address in providing meaningful support. We used semi-structured retrospective interviews with 14 students to understand their perceptions, emotions, and coping in response to discriminatory behaviors within the prior ten-week period. These 14 students were among 90 who provided experience sampling reports of unfair treatment over the same ten-week period. We found that discrimination is more distressing if students face related academic and social struggles or when the incident triggers beliefs of inefficacy. We additionally identified patterns of effective coping. By grounding the findings in an extended stress processing framework, we offer a principled approach to intervention design, which we illustrate through incident-specific and proactive intervention paradigms. △ Less

Submitted 25 November, 2021; originally announced November 2021.

ACM Class: J.4

arXiv:2111.11785 [pdf, other]

Realistic simulation of users for IT systems in cyber ranges

Authors: Alexandre Dey, Benjamin Costé, Éric Totel, Adrien Bécue

Abstract: Generating user activity is a key capability for both evaluating security monitoring tools as well as improving the credibility of attacker analysis platforms (e.g., honeynets). In this paper, to generate this activity, we instrument each machine by means of an external agent. This agent combines both deterministic and deep learning based methods to adapt to different environment (e.g., multiple O… ▽ More Generating user activity is a key capability for both evaluating security monitoring tools as well as improving the credibility of attacker analysis platforms (e.g., honeynets). In this paper, to generate this activity, we instrument each machine by means of an external agent. This agent combines both deterministic and deep learning based methods to adapt to different environment (e.g., multiple OS, software versions, etc.), while maintaining high performances. We also propose conditional text generation models to facilitate the creation of conversations and documents to accelerate the definition of coherent, system-wide, life scenarios. △ Less

Submitted 23 November, 2021; originally announced November 2021.

Comments: in French

Journal ref: CAID 2021 : applications de l'Intelligence Artificielle aux probl{é}matiques d{é}fense, Nov 2021, Rennes, France

arXiv:2109.02160 [pdf, other]

Urban Fire Station Location Planning using Predicted Demand and Service Quality Index

Authors: Arnab Dey, Andrew Heger, Darin England

Abstract: In this article, we propose a systematic approach for fire station location planning. We develop machine learning models, based on Random Forest and Extreme Gradient Boosting, for demand prediction and utilize the models further to define a generalized index to measure quality of fire service in urban settings. Our model is built upon spatial data collected from multiple different sources. Efficac… ▽ More In this article, we propose a systematic approach for fire station location planning. We develop machine learning models, based on Random Forest and Extreme Gradient Boosting, for demand prediction and utilize the models further to define a generalized index to measure quality of fire service in urban settings. Our model is built upon spatial data collected from multiple different sources. Efficacy of proper facility planning depends on choice of candidates where fire stations can be located along with existing stations, if any. Also, the travel time from these candidates to demand locations need to be taken care of to maintain fire safety standard. Here, we propose a travel time based clustering technique to identify suitable candidates. Finally, we develop an optimization problem to select best locations to install new fire stations. Our optimization problem is built upon maximum coverage problem, based on integer programming. We further develop a two-stage stochastic optimization model to characterize the confidence in our decision outcome. We present a detailed experimental study of our proposed approach in collaboration with city of Victoria Fire Department, MN, USA. Our demand prediction model achieves true positive rate of 80% and false positive rate of 20% approximately. We aid Victoria Fire Department to select a location for a new fire station using our approach. We present detailed results on improvement statistics by locating a new facility, as suggested by our methodology, in the city of Victoria. △ Less

Submitted 20 February, 2022; v1 submitted 5 September, 2021; originally announced September 2021.

arXiv:2108.09717 [pdf, other]

doi 10.1109/ACCESS.2022.3186471

EKTVQA: Generalized use of External Knowledge to empower Scene Text in Text-VQA

Authors: Arka Ujjal Dey, Ernest Valveny, Gaurav Harit

Abstract: The open-ended question answering task of Text-VQA often requires reading and reasoning about rarely seen or completely unseen scene-text content of an image. We address this zero-shot nature of the problem by proposing the generalized use of external knowledge to augment our understanding of the scene text. We design a framework to extract, validate, and reason with knowledge using a standard mul… ▽ More The open-ended question answering task of Text-VQA often requires reading and reasoning about rarely seen or completely unseen scene-text content of an image. We address this zero-shot nature of the problem by proposing the generalized use of external knowledge to augment our understanding of the scene text. We design a framework to extract, validate, and reason with knowledge using a standard multimodal transformer for vision language understanding tasks. Through empirical evidence and qualitative results, we demonstrate how external knowledge can highlight instance-only cues and thus help deal with training data bias, improve answer entity type correctness, and detect multiword named entities. We generate results comparable to the state-of-the-art on three publicly available datasets, under the constraints of similar upstream OCR systems and training data. △ Less

Submitted 15 July, 2022; v1 submitted 22 August, 2021; originally announced August 2021.

Comments: Accepted at IEEE Access

Journal ref: IEEE.ACCESS 10 (2022) 72092-72106

arXiv:2103.10730 [pdf, other]

MuRIL: Multilingual Representations for Indian Languages

Authors: Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, Partha Talukdar

Abstract: India is a multilingual society with 1369 rationalized languages and dialects being spoken across the country (INDIA, 2011). Of these, the 22 scheduled languages have a staggering total of 1.17 billion speakers and 121 languages have more than 10,000 speakers (INDIA, 2011). India also has the second largest (and an ever growing) digital footprint (Statista, 2020). Despite this, today's state-of-th… ▽ More India is a multilingual society with 1369 rationalized languages and dialects being spoken across the country (INDIA, 2011). Of these, the 22 scheduled languages have a staggering total of 1.17 billion speakers and 121 languages have more than 10,000 speakers (INDIA, 2011). India also has the second largest (and an ever growing) digital footprint (Statista, 2020). Despite this, today's state-of-the-art multilingual systems perform suboptimally on Indian (IN) languages. This can be explained by the fact that multilingual language models (LMs) are often trained on 100+ languages together, leading to a small representation of IN languages in their vocabulary and training data. Multilingual LMs are substantially less effective in resource-lean scenarios (Wu and Dredze, 2020; Lauscher et al., 2020), as limited data doesn't help capture the various nuances of a language. One also commonly observes IN language text transliterated to Latin or code-mixed with English, especially in informal settings (for example, on social media platforms) (Rijhwani et al., 2017). This phenomenon is not adequately handled by current state-of-the-art multilingual LMs. To address the aforementioned gaps, we propose MuRIL, a multilingual LM specifically built for IN languages. MuRIL is trained on significantly large amounts of IN text corpora only. We explicitly augment monolingual text corpora with both translated and transliterated document pairs, that serve as supervised cross-lingual signals in training. MuRIL significantly outperforms multilingual BERT (mBERT) on all tasks in the challenging cross-lingual XTREME benchmark (Hu et al., 2020). We also present results on transliterated (native to Latin script) test sets of the chosen datasets and demonstrate the efficacy of MuRIL in handling transliterated data. △ Less

Submitted 2 April, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

arXiv:2102.04212 [pdf]

Understanding health and behavioral trends of successful students through machine learning models

Authors: Abigale Kim, Fateme Nikseresht, Janine M. Dutcher, Michael Tumminia, Daniella Villalba, Sheldon Cohen, Kasey Creswel, David Creswell, Anind K. Dey, Jennifer Mankoff, Afsaneh Doryab

Abstract: This study analyzes patterns of physical, mental, lifestyle, and personality factors in college students in different periods over the course of a semester and models their relationships with students' academic performance. The data analyzed was collected through smartphones and Fitbit. The use of machine learning models derived from the gathered data was employed to observe the extent of students… ▽ More This study analyzes patterns of physical, mental, lifestyle, and personality factors in college students in different periods over the course of a semester and models their relationships with students' academic performance. The data analyzed was collected through smartphones and Fitbit. The use of machine learning models derived from the gathered data was employed to observe the extent of students' behavior associated with their GPA, lifestyle, physical health, mental health, and personality attributes. A mutual agreement method was used in which rather than looking at the accuracy of results, the model parameters and weights of features were used to find common behavioral trends. From the results of the model creation, it was determined that the most significant indicator of academic success defined as a higher GPA, was the places a student spent their time. Lifestyle and personality factors were deemed more significant than mental and physical factors. This study will provide insight into the impact of different factors and the timing of those factors on students' academic performance. △ Less

Submitted 23 January, 2021; originally announced February 2021.

Comments: 10 pages, 6 plots

arXiv:2010.01786 [pdf, other]

Corpora Evaluation and System Bias Detection in Multi-document Summarization

Authors: Alvin Dey, Tanya Chowdhury, Yash Kumar Atri, Tanmoy Chakraborty

Abstract: Multi-document summarization (MDS) is the task of reflecting key points from any set of documents into a concise text paragraph. In the past, it has been used to aggregate news, tweets, product reviews, etc. from various sources. Owing to no standard definition of the task, we encounter a plethora of datasets with varying levels of overlap and conflict between participating documents. There is als… ▽ More Multi-document summarization (MDS) is the task of reflecting key points from any set of documents into a concise text paragraph. In the past, it has been used to aggregate news, tweets, product reviews, etc. from various sources. Owing to no standard definition of the task, we encounter a plethora of datasets with varying levels of overlap and conflict between participating documents. There is also no standard regarding what constitutes summary information in MDS. Adding to the challenge is the fact that new systems report results on a set of chosen datasets, which might not correlate with their performance on the other datasets. In this paper, we study this heterogeneous task with the help of a few widely used MDS corpora and a suite of state-of-the-art models. We make an attempt to quantify the quality of summarization corpus and prescribe a list of points to consider while proposing a new MDS corpus. Next, we analyze the reason behind the absence of an MDS system which achieves superior performance across all corpora. We then observe the extent to which system metrics are influenced, and bias is propagated due to corpus properties. The scripts to reproduce the experiments in this work are available at https://github.com/LCS2-IIITD/summarization_bias.git. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: 11 pages, 3 tables, 5 figures, Accepted in the Findings of EMNLP, 2020

Showing 1–50 of 68 results for author: Dey, A