Search | arXiv e-print repository

doi 10.1145/3654777.3676327

IRIS: Wireless Ring for Vision-based Smart Home Interaction

Authors: Maruchi Kim, Antonio Glenn, Bandhav Veluri, Yunseo Lee, Eyoel Gebre, Aditya Bagaria, Shwetak Patel, Shyamnath Gollakota

Abstract: Integrating cameras into wireless smart rings has been challenging due to size and power constraints. We introduce IRIS, the first wireless vision-enabled smart ring system for smart home interactions. Equipped with a camera, Bluetooth radio, inertial measurement unit (IMU), and an onboard battery, IRIS meets the small size, weight, and power (SWaP) requirements for ring devices. IRIS is context-a… ▽ More Integrating cameras into wireless smart rings has been challenging due to size and power constraints. We introduce IRIS, the first wireless vision-enabled smart ring system for smart home interactions. Equipped with a camera, Bluetooth radio, inertial measurement unit (IMU), and an onboard battery, IRIS meets the small size, weight, and power (SWaP) requirements for ring devices. IRIS is context-aware, adapting its gesture set to the detected device, and can last for 16-24 hours on a single charge. IRIS leverages the scene semantics to achieve instance-level device recognition. In a study involving 23 participants, IRIS consistently outpaced voice commands, with a higher proportion of participants expressing a preference for IRIS over voice commands regarding toggling a device's state, granular control, and social acceptability. Our work pushes the boundary of what is possible with ring form-factor devices, addressing system challenges and opening up novel interaction capabilities. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: 15 pages, 17 figures, 6 tables, to be published in UIST 2024

arXiv:2407.17380 [pdf]

2D and 3D Deep Learning Models for MRI-based Parkinson's Disease Classification: A Comparative Analysis of Convolutional Kolmogorov-Arnold Networks, Convolutional Neural Networks, and Graph Convolutional Networks

Authors: Salil B Patel, Vicky Goh, James F FitzGerald, Chrystalina A Antoniades

Abstract: Early and accurate diagnosis of Parkinson's Disease (PD) remains challenging. This study compares deep learning architectures for MRI-based PD classification, introducing the first three-dimensional (3D) implementation of Convolutional Kolmogorov-Arnold Networks (ConvKANs), a new approach that combines convolution layers with adaptive, spline-based activations. We evaluated Convolutional Neural Ne… ▽ More Early and accurate diagnosis of Parkinson's Disease (PD) remains challenging. This study compares deep learning architectures for MRI-based PD classification, introducing the first three-dimensional (3D) implementation of Convolutional Kolmogorov-Arnold Networks (ConvKANs), a new approach that combines convolution layers with adaptive, spline-based activations. We evaluated Convolutional Neural Networks (CNNs), ConvKANs, and Graph Convolutional Networks (GCNs) using three open-source datasets; a total of 142 participants (75 with PD and 67 age-matched healthy controls). For 2D analysis, we extracted 100 axial slices centred on the midbrain from each T1-weighted scan. For 3D analysis, we used the entire volumetric scans. ConvKANs integrate learnable B-spline functions with convolutional layers. GCNs represent MRI data as graphs, theoretically capturing structural relationships that may be overlooked by traditional approaches. Interpretability visualizations, including the first ConvKAN spline activation maps, and projections of graph node embeddings, were depicted. ConvKANs demonstrated high performance across datasets and dimensionalities, achieving the highest 2D AUROC (0.98) in one dataset and matching CNN peak 3D performance (1.00). CNN models performed well, while GCN models improved in 3D analyses, reaching up to 0.97 AUROC. 3D implementations yielded higher AUROC values compared to 2D counterparts across all models. ConvKAN implementation shows promise for MRI analysis in PD classification, particularly in the context of early diagnosis. The improvement in 3D analyses highlights the value of volumetric data in capturing subtle PD-related changes. While MRI is not currently used for PD diagnosis, these findings suggest its potential as a component of a multimodal diagnostic approach, especially for early detection. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: 19 Pages, 5 figures

arXiv:2407.16814 [pdf, other]

Quantum Constacyclic BCH Codes over Qudits: A Spectral-Domain Approach

Authors: Shikha Patel, Shayan Srinivasa Garani

Abstract: We characterize constacyclic codes in the spectral domain using the finite field Fourier transform (FFFT) and propose a reduced complexity method for the spectral-domain decoder. Further, we also consider repeated-root constacyclic codes and characterize them in terms of symmetric and asymmetric $q$-cyclotomic cosets. Using zero sets of classical self-orthogonal and dual-containing codes, we deriv… ▽ More We characterize constacyclic codes in the spectral domain using the finite field Fourier transform (FFFT) and propose a reduced complexity method for the spectral-domain decoder. Further, we also consider repeated-root constacyclic codes and characterize them in terms of symmetric and asymmetric $q$-cyclotomic cosets. Using zero sets of classical self-orthogonal and dual-containing codes, we derive quantum error correcting codes (QECCs) for both constacyclic Bose-Chaudhuri-Hocquenghem (BCH) codes and repeated-root constacyclic codes. We provide some examples of QECCs derived from repeated-root constacyclic codes and show that constacyclic BCH codes are more efficient than repeated-root constacyclic codes. Finally, quantum encoders and decoders are also proposed in the transform domain for Calderbank-Shor-Steane CSS-based quantum codes. Since constacyclic codes are a generalization of cyclic codes with better minimum distance than cyclic codes with the same code parameters, the proposed results are practically useful. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: 40 pages, 2 figures

arXiv:2407.09688 [pdf, other]

Large Language Models for Integrating Social Determinant of Health Data: A Case Study on Heart Failure 30-Day Readmission Prediction

Authors: Chase Fensore, Rodrigo M. Carrillo-Larco, Shivani A. Patel, Alanna A. Morris, Joyce C. Ho

Abstract: Social determinants of health (SDOH) $-$ the myriad of circumstances in which people live, grow, and age $-$ play an important role in health outcomes. However, existing outcome prediction models often only use proxies of SDOH as features. Recent open data initiatives present an opportunity to construct a more comprehensive view of SDOH, but manually integrating the most relevant data for individu… ▽ More Social determinants of health (SDOH) $-$ the myriad of circumstances in which people live, grow, and age $-$ play an important role in health outcomes. However, existing outcome prediction models often only use proxies of SDOH as features. Recent open data initiatives present an opportunity to construct a more comprehensive view of SDOH, but manually integrating the most relevant data for individual patients becomes increasingly challenging as the volume and diversity of public SDOH data grows. Large language models (LLMs) have shown promise at automatically annotating structured data. Here, we conduct an end-to-end case study evaluating the feasibility of using LLMs to integrate SDOH data, and the utility of these SDOH features for clinical prediction. We first manually label 700+ variables from two publicly-accessible SDOH data sources to one of five semantic SDOH categories. Then, we benchmark performance of 9 open-source LLMs on this classification task. Finally, we train ML models to predict 30-day hospital readmission among 39k heart failure (HF) patients, and we compare the prediction performance of the categorized SDOH variables with standard clinical variables. Additionally, we investigate the impact of few-shot LLM prompting on LLM annotation performance, and perform a metadata ablation study on prompts to evaluate which information helps LLMs accurately annotate these variables. We find that some open-source LLMs can effectively, accurately annotate SDOH variables with zero-shot prompting without the need for fine-tuning. Crucially, when combined with standard clinical features, the LLM-annotated Neighborhood and Built Environment subset of the SDOH variables shows the best performance predicting 30-day readmission of HF patients. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 36 pages including references and appendix. This is a work in progress

arXiv:2407.07277 [pdf, other]

Lifestyle-Informed Personalized Blood Biomarker Prediction via Novel Representation Learning

Authors: A. Ali Heydari, Naghmeh Rezaei, Javier L. Prieto, Shwetak N. Patel, Ahmed A. Metwally

Abstract: Blood biomarkers are an essential tool for healthcare providers to diagnose, monitor, and treat a wide range of medical conditions. Current reference values and recommended ranges often rely on population-level statistics, which may not adequately account for the influence of inter-individual variability driven by factors such as lifestyle and genetics. In this work, we introduce a novel framework… ▽ More Blood biomarkers are an essential tool for healthcare providers to diagnose, monitor, and treat a wide range of medical conditions. Current reference values and recommended ranges often rely on population-level statistics, which may not adequately account for the influence of inter-individual variability driven by factors such as lifestyle and genetics. In this work, we introduce a novel framework for predicting future blood biomarker values and define personalized references through learned representations from lifestyle data (physical activity and sleep) and blood biomarkers. Our proposed method learns a similarity-based embedding space that captures the complex relationship between biomarkers and lifestyle factors. Using the UK Biobank (257K participants), our results show that our deep-learned embeddings outperform traditional and current state-of-the-art representation learning techniques in predicting clinical diagnosis. Using a subset of UK Biobank of 6440 participants who have follow-up visits, we validate that the inclusion of these embeddings and lifestyle factors directly in blood biomarker models improves the prediction of future lab values from a single lab visit. This personalized modeling approach provides a foundation for developing more accurate risk stratification tools and tailoring preventative care strategies. In clinical settings, this translates to the potential for earlier disease detection, more timely interventions, and ultimately, a shift towards personalized healthcare. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.01216 [pdf, other]

Let Hybrid A* Path Planner Obey Traffic Rules: A Deep Reinforcement Learning-Based Planning Framework

Authors: Xibo Li, Shruti Patel, Christof Büskens

Abstract: Deep reinforcement learning (DRL) allows a system to interact with its environment and take actions by training an efficient policy that maximizes self-defined rewards. In autonomous driving, it can be used as a strategy for high-level decision making, whereas low-level algorithms such as the hybrid A* path planning have proven their ability to solve the local trajectory planning problem. In this… ▽ More Deep reinforcement learning (DRL) allows a system to interact with its environment and take actions by training an efficient policy that maximizes self-defined rewards. In autonomous driving, it can be used as a strategy for high-level decision making, whereas low-level algorithms such as the hybrid A* path planning have proven their ability to solve the local trajectory planning problem. In this work, we combine these two methods where the DRL makes high-level decisions such as lane change commands. After obtaining the lane change command, the hybrid A* planner is able to generate a collision-free trajectory to be executed by a model predictive controller (MPC). In addition, the DRL algorithm is able to keep the lane change command consistent within a chosen time-period. Traffic rules are implemented using linear temporal logic (LTL), which is then utilized as a reward function in DRL. Furthermore, we validate the proposed method on a real system to demonstrate its feasibility from simulation to implementation on real hardware. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.06474 [pdf, other]

Towards a Personal Health Large Language Model

Authors: Justin Cosentino, Anastasiya Belyaeva, Xin Liu, Nicholas A. Furlotte, Zhun Yang, Chace Lee, Erik Schenck, Yojan Patel, Jian Cui, Logan Douglas Schneider, Robby Bryant, Ryan G. Gomes, Allen Jiang, Roy Lee, Yun Liu, Javier Perez, Jameson K. Rogers, Cathy Speed, Shyam Tailor, Megan Walker, Jeffrey Yu, Tim Althoff, Conor Heneghan, John Hernandez, Mark Malhotra , et al. (9 additional authors not shown)

Abstract: In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We… ▽ More In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We created and curated three datasets that test 1) production of personalized insights and recommendations from sleep patterns, physical activity, and physiological responses, 2) expert domain knowledge, and 3) prediction of self-reported sleep outcomes. For the first task we designed 857 case studies in collaboration with domain experts to assess real-world scenarios in sleep and fitness. Through comprehensive evaluation of domain-specific rubrics, we observed that Gemini Ultra 1.0 and PH-LLM are not statistically different from expert performance in fitness and, while experts remain superior for sleep, fine-tuning PH-LLM provided significant improvements in using relevant domain knowledge and personalizing information for sleep insights. We evaluated PH-LLM domain knowledge using multiple choice sleep medicine and fitness examinations. PH-LLM achieved 79% on sleep and 88% on fitness, exceeding average scores from a sample of human experts. Finally, we trained PH-LLM to predict self-reported sleep quality outcomes from textual and multimodal encoding representations of wearable data, and demonstrate that multimodal encoding is required to match performance of specialized discriminative models. Although further development and evaluation are necessary in the safety-critical personal health domain, these results demonstrate both the broad knowledge and capabilities of Gemini models and the benefit of contextualizing physiological data for personal health applications as done with PH-LLM. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 72 pages

arXiv:2406.06464 [pdf, other]

Transforming Wearable Data into Health Insights using Large Language Model Agents

Authors: Mike A. Merrill, Akshay Paruchuri, Naghmeh Rezaei, Geza Kovacs, Javier Perez, Yun Liu, Erik Schenck, Nova Hammerquist, Jake Sunshine, Shyam Tailor, Kumar Ayush, Hao-Wei Su, Qian He, Cory Y. McLean, Mark Malhotra, Shwetak Patel, Jiening Zhan, Tim Althoff, Daniel McDuff, Xin Liu

Abstract: Despite the proliferation of wearable health trackers and the importance of sleep and exercise to health, deriving actionable personalized insights from wearable data remains a challenge because doing so requires non-trivial open-ended analysis of these data. The recent rise of large language model (LLM) agents, which can use tools to reason about and interact with the world, presents a promising… ▽ More Despite the proliferation of wearable health trackers and the importance of sleep and exercise to health, deriving actionable personalized insights from wearable data remains a challenge because doing so requires non-trivial open-ended analysis of these data. The recent rise of large language model (LLM) agents, which can use tools to reason about and interact with the world, presents a promising opportunity to enable such personalized analysis at scale. Yet, the application of LLM agents in analyzing personal health is still largely untapped. In this paper, we introduce the Personal Health Insights Agent (PHIA), an agent system that leverages state-of-the-art code generation and information retrieval tools to analyze and interpret behavioral health data from wearables. We curate two benchmark question-answering datasets of over 4000 health insights questions. Based on 650 hours of human and expert evaluation we find that PHIA can accurately address over 84% of factual numerical questions and more than 83% of crowd-sourced open-ended questions. This work has implications for advancing behavioral health across the population, potentially enabling individuals to interpret their own wearable data, and paving the way for a new era of accessible, personalized wellness regimens that are informed by data-driven insights. △ Less

Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: 38 pages

arXiv:2406.02554 [pdf, other]

Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition

Authors: Shijian Deng, Erin E. Kosloski, Siddhi Patel, Zeke A. Barnett, Yiyang Nan, Alexander Kaplan, Sisira Aarukapalli, William T. Doan, Matthew Wang, Harsh Singh, Pamela R. Rollins, Yapeng Tian

Abstract: In this article, we introduce a novel problem of audio-visual autism behavior recognition, which includes social behavior recognition, an essential aspect previously omitted in AI-assisted autism screening research. We define the task at hand as one that is audio-visual autism behavior recognition, which uses audio and visual cues, including any speech present in the audio, to recognize autism-rel… ▽ More In this article, we introduce a novel problem of audio-visual autism behavior recognition, which includes social behavior recognition, an essential aspect previously omitted in AI-assisted autism screening research. We define the task at hand as one that is audio-visual autism behavior recognition, which uses audio and visual cues, including any speech present in the audio, to recognize autism-related behaviors. To facilitate this new research direction, we collected an audio-visual autism spectrum dataset (AV-ASD), currently the largest video dataset for autism screening using a behavioral approach. It covers an extensive range of autism-associated behaviors, including those related to social communication and interaction. To pave the way for further research on this new problem, we intensively explored leveraging foundation models and multimodal large language models across different modalities. Our experiments on the AV-ASD dataset demonstrate that integrating audio, visual, and speech modalities significantly enhances the performance in autism behavior recognition. Additionally, we explored the use of a post-hoc to ad-hoc pipeline in a multimodal large language model to investigate its potential to augment the model's explanatory capability during autism behavior recognition. We will release our dataset, code, and pre-trained models. △ Less

Submitted 22 March, 2024; originally announced June 2024.

arXiv:2406.01915 [pdf, other]

Enhancing Human-Robot Collaborative Assembly in Manufacturing Systems Using Large Language Models

Authors: Jonghan Lim, Sujani Patel, Alex Evans, John Pimley, Yifei Li, Ilya Kovalenko

Abstract: The development of human-robot collaboration has the ability to improve manufacturing system performance by leveraging the unique strengths of both humans and robots. On the shop floor, human operators contribute with their adaptability and flexibility in dynamic situations, while robots provide precision and the ability to perform repetitive tasks. However, the communication gap between human ope… ▽ More The development of human-robot collaboration has the ability to improve manufacturing system performance by leveraging the unique strengths of both humans and robots. On the shop floor, human operators contribute with their adaptability and flexibility in dynamic situations, while robots provide precision and the ability to perform repetitive tasks. However, the communication gap between human operators and robots limits the collaboration and coordination of human-robot teams in manufacturing systems. Our research presents a human-robot collaborative assembly framework that utilizes a large language model for enhancing communication in manufacturing environments. The framework facilitates human-robot communication by integrating voice commands through natural language for task management. A case study for an assembly task demonstrates the framework's ability to process natural language inputs and address real-time assembly challenges, emphasizing adaptability to language variation and efficiency in error resolution. The results suggest that large language models have the potential to improve human-robot interaction for collaborative manufacturing assembly applications. △ Less

Submitted 21 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00010 [pdf, other]

EnterpriseEM: Fine-tuned Embeddings for Enterprise Semantic Search

Authors: Kamalkumar Rathinasamy, Jayarama Nettar, Amit Kumar, Vishal Manchanda, Arun Vijayakumar, Ayush Kataria, Venkateshprasanna Manjunath, Chidambaram GS, Jaskirat Singh Sodhi, Shoeb Shaikh, Wasim Akhtar Khan, Prashant Singh, Tanishq Dattatray Ige, Vipin Tiwari, Rajab Ali Mondal, Harshini K, S Reka, Chetana Amancharla, Faiz ur Rahman, Harikrishnan P A, Indraneel Saha, Bhavya Tiwary, Navin Shankar Patel, Pradeep T S, Balaji A J , et al. (2 additional authors not shown)

Abstract: Enterprises grapple with the significant challenge of managing proprietary unstructured data, hindering efficient information retrieval. This has led to the emergence of AI-driven information retrieval solutions, designed to adeptly extract relevant insights to address employee inquiries. These solutions often leverage pre-trained embedding models and generative models as foundational components.… ▽ More Enterprises grapple with the significant challenge of managing proprietary unstructured data, hindering efficient information retrieval. This has led to the emergence of AI-driven information retrieval solutions, designed to adeptly extract relevant insights to address employee inquiries. These solutions often leverage pre-trained embedding models and generative models as foundational components. While pre-trained embeddings may exhibit proximity or disparity based on their original training objectives, they might not fully align with the unique characteristics of enterprise-specific data, leading to suboptimal alignment with the retrieval goals of enterprise environments. In this paper, we propose a methodology to fine-tune pre-trained embedding models specifically for enterprise environments. By adapting the embeddings to better suit the retrieval tasks prevalent in enterprises, we aim to enhance the performance of information retrieval solutions. We discuss the process of fine-tuning, its effect on retrieval accuracy, and the potential benefits for enterprise information management. Our findings demonstrate the efficacy of fine-tuned embedding models in improving the precision and relevance of search results in enterprise settings. △ Less

Submitted 18 May, 2024; originally announced June 2024.

ACM Class: I.2.7

arXiv:2405.19338 [pdf, other]

Accurate Patient Alignment without Unnecessary Imaging Dose via Synthesizing Patient-specific 3D CT Images from 2D kV Images

Authors: Yuzhen Ding, Jason M. Holmes, Hongying Feng, Baoxin Li, Lisa A. McGee, Jean-Claude M. Rwigema, Sujay A. Vora, Daniel J. Ma, Robert L. Foote, Samir H. Patel, Wei Liu

Abstract: In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imag… ▽ More In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imaging dose, thus unfavorable for pediatric patients. A solution to this dilemma is to reconstruct 3D CT from kV images obtained at the treatment position. Here, we propose a dual-models framework built with hierarchical ViT blocks. Unlike a proof-of-concept approach, our framework considers kV images as the solo input and can synthesize accurate, full-size 3D CT in real time(within milliseconds). We demonstrate the feasibility of the proposed approach on 10 patients with head and neck (H&N) cancer using image quality(MAE: <45HU), dosimetrical accuracy(Gamma passing rate (2%/2mm/10%)>97%) and patient position uncertainty(shift error: <0.4mm). The proposed framework can generate accurate 3D CT faithfully mirroring real-time patient position, thus significantly improving patient setup accuracy, keeping imaging dose minimum, and maintaining treatment veracity. △ Less

Submitted 1 April, 2024; originally announced May 2024.

Comments: 17 pages, 8 figures and tables

arXiv:2405.04657 [pdf, other]

ACEGEN: Reinforcement learning of generative chemical agents for drug discovery

Authors: Albert Bou, Morgan Thomas, Sebastian Dittert, Carles Navarro Ramírez, Maciej Majewski, Ye Wang, Shivam Patel, Gary Tresadern, Mazen Ahmad, Vincent Moens, Woody Sherman, Simone Sciabola, Gianni De Fabritiis

Abstract: In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capabilities, flexibility, reliability, and efficiency remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we… ▽ More In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capabilities, flexibility, reliability, and efficiency remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we introduce ACEGEN, a comprehensive and streamlined toolkit tailored for generative drug design, built using TorchRL, a modern RL library that offers thoroughly tested reusable components. We validate ACEGEN by benchmarking against other published generative modeling algorithms and show comparable or improved performance. We also show examples of ACEGEN applied in multiple drug discovery case studies. ACEGEN is accessible at \url{https://github.com/acellera/acegen-open} and available for use under the MIT license. △ Less

Submitted 22 July, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2404.18867 [pdf, other]

Feminist Interaction Techniques: Deterring Non-Consensual Screenshots with Interaction Techniques

Authors: Li Qiwei, Francesca Lameiro, Shefali Patel, Cristi-Isaula-Reyes, Eytan Adar, Eric Gilbert, Sarita Schoenebeck

Abstract: Non-consensual Intimate Media (NCIM) refers to the distribution of sexual or intimate content without consent. NCIM is common and causes significant emotional, financial, and reputational harm. We developed Hands-Off, an interaction technique for messaging applications that deters non-consensual screenshots. Hands-Off requires recipients to perform a hand gesture in the air, above the device, to u… ▽ More Non-consensual Intimate Media (NCIM) refers to the distribution of sexual or intimate content without consent. NCIM is common and causes significant emotional, financial, and reputational harm. We developed Hands-Off, an interaction technique for messaging applications that deters non-consensual screenshots. Hands-Off requires recipients to perform a hand gesture in the air, above the device, to unlock media -- which makes simultaneous screenshotting difficult. A lab study shows that Hands-Off gestures are easy to perform and reduce non-consensual screenshots by 67 percent. We conclude by generalizing this approach and introduce the idea of Feminist Interaction Techniques (FIT), interaction techniques that encode feminist values and speak to societal problems, and reflect on FIT's opportunities and limitations. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.17347 [pdf, other]

InspectorRAGet: An Introspection Platform for RAG Evaluation

Authors: Kshitij Fadnis, Siva Sankalp Patel, Odellia Boni, Yannis Katsis, Sara Rosenthal, Benjamin Sznajder, Marina Danilevsky

Abstract: Large Language Models (LLM) have become a popular approach for implementing Retrieval Augmented Generation (RAG) systems, and a significant amount of effort has been spent on building good models and metrics. In spite of increased recognition of the need for rigorous evaluation of RAG systems, few tools exist that go beyond the creation of model output and automatic calculation. We present Inspect… ▽ More Large Language Models (LLM) have become a popular approach for implementing Retrieval Augmented Generation (RAG) systems, and a significant amount of effort has been spent on building good models and metrics. In spite of increased recognition of the need for rigorous evaluation of RAG systems, few tools exist that go beyond the creation of model output and automatic calculation. We present InspectorRAGet, an introspection platform for RAG evaluation. InspectorRAGet allows the user to analyze aggregate and instance-level performance of RAG systems, using both human and algorithmic metrics as well as annotator quality. InspectorRAGet is suitable for multiple use cases and is available publicly to the community. The demo video is available at https://youtu.be/MJhe8QIXcEc △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.13502 [pdf, other]

Optimal Non-Adaptive Tolerant Junta Testing via Local Estimators

Authors: Shivam Nadimpalli, Shyamal Patel

Abstract: We give a non-adaptive algorithm that makes $2^{\tilde{O}(\sqrt{k\log(1/\varepsilon_2 - \varepsilon_1)})}$ queries to a Boolean function $f:\{\pm 1\}^n \rightarrow \{\pm 1\}$ and distinguishes between $f$ being $\varepsilon_1$-close to some $k$-junta versus $\varepsilon_2$-far from every $k$-junta. At the heart of our algorithm is a local mean estimation procedure for Boolean functions that may be… ▽ More We give a non-adaptive algorithm that makes $2^{\tilde{O}(\sqrt{k\log(1/\varepsilon_2 - \varepsilon_1)})}$ queries to a Boolean function $f:\{\pm 1\}^n \rightarrow \{\pm 1\}$ and distinguishes between $f$ being $\varepsilon_1$-close to some $k$-junta versus $\varepsilon_2$-far from every $k$-junta. At the heart of our algorithm is a local mean estimation procedure for Boolean functions that may be of independent interest. We complement our upper bound with a matching lower bound, improving a recent lower bound obtained by Chen et al. We thus obtain the first tight bounds for a natural property of Boolean functions in the tolerant testing model. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: To appear in STOC 2024

arXiv:2404.11103 [pdf, ps, other]

Distribution-Free Testing of Decision Lists with a Sublinear Number of Queries

Authors: Xi Chen, Yumou Fei, Shyamal Patel

Abstract: We give a distribution-free testing algorithm for decision lists with $\tilde{O}(n^{11/12}/\varepsilon^3)$ queries. This is the first sublinear algorithm for this problem, which shows that, unlike halfspaces, testing is strictly easier than learning for decision lists. Complementing the algorithm, we show that any distribution-free tester for decision lists must make $\tildeΩ(\sqrt{n})$ queries, o… ▽ More We give a distribution-free testing algorithm for decision lists with $\tilde{O}(n^{11/12}/\varepsilon^3)$ queries. This is the first sublinear algorithm for this problem, which shows that, unlike halfspaces, testing is strictly easier than learning for decision lists. Complementing the algorithm, we show that any distribution-free tester for decision lists must make $\tildeΩ(\sqrt{n})$ queries, or draw $\tildeΩ(n)$ samples when the algorithm is sample-based. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: To appear in STOC 2024

arXiv:2404.04729 [pdf, other]

Towards a low carbon proof-of-work blockchain

Authors: Agron Gemajli, Shivam Patel, Phillip G. Bradford

Abstract: Proof of Work (PoW) blockchains burn a lot of energy. Proof-of-work algorithms are expensive by design and often only serve to compute blockchains. In some sense, carbon-based and non-carbon based regional electric power is fungible. So the total carbon and non-carbon electric power mix plays a role. Thus, generally PoW algorithms have large CO$_2$ footprints solely for computing blockchains. A pr… ▽ More Proof of Work (PoW) blockchains burn a lot of energy. Proof-of-work algorithms are expensive by design and often only serve to compute blockchains. In some sense, carbon-based and non-carbon based regional electric power is fungible. So the total carbon and non-carbon electric power mix plays a role. Thus, generally PoW algorithms have large CO$_2$ footprints solely for computing blockchains. A proof of technology is described towards replacing hashcash or other PoW methods with a lottery and proof-of-VM (PoVM) emulation. PoVM emulation is a form of PoW where an autonomous blockchain miner gets a lottery ticket in exchange for providing a VM (virtual Machine) for a specified period. These VMs get their jobs from a job queue. Managing and ensuring, by concensus, that autonomous PoVMs are properly configured and running as expected gives several gaps for a complete practical system. These gaps are discussed. Our system is similar to a number of other blockchain systems. We briefly survey these systems. This paper along with our proof of technology was done as a senior design project. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.04525 [pdf, other]

IITK at SemEval-2024 Task 10: Who is the speaker? Improving Emotion Recognition and Flip Reasoning in Conversations via Speaker Embeddings

Authors: Shubham Patel, Divyaksh Shukla, Ashutosh Modi

Abstract: This paper presents our approach for the SemEval-2024 Task 10: Emotion Discovery and Reasoning its Flip in Conversations. For the Emotion Recognition in Conversations (ERC) task, we utilize a masked-memory network along with speaker participation. We propose a transformer-based speaker-centric model for the Emotion Flip Reasoning (EFR) task. We also introduce Probable Trigger Zone, a region of the… ▽ More This paper presents our approach for the SemEval-2024 Task 10: Emotion Discovery and Reasoning its Flip in Conversations. For the Emotion Recognition in Conversations (ERC) task, we utilize a masked-memory network along with speaker participation. We propose a transformer-based speaker-centric model for the Emotion Flip Reasoning (EFR) task. We also introduce Probable Trigger Zone, a region of the conversation that is more likely to contain the utterances causing the emotion to flip. For sub-task 3, the proposed approach achieves a 5.9 (F1 score) improvement over the task baseline. The ablation study results highlight the significance of various design choices in the proposed method. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: Accepted at SemEval 2024, NAACL 2024; 10 Pages

arXiv:2404.03130 [pdf, other]

Biodegradable Interactive Materials

Authors: Zhihan Zhang, Mallory Parker, Kuotian Liao, Jerry Cao, Anandghan Waghmare, Joseph Breda, Chris Matsumura, Serena Eley, Eleftheria Roumeli, Shwetak Patel, Vikram Iyer

Abstract: The sense of touch is fundamental to how we interact with the physical and digital world. Conventional interactive surfaces and tactile interfaces use electronic sensors embedded into objects, however this approach poses serious challenges both for environmental sustainability and a future of truly ubiquitous interaction systems where information is encoded into everyday objects. In this work, we… ▽ More The sense of touch is fundamental to how we interact with the physical and digital world. Conventional interactive surfaces and tactile interfaces use electronic sensors embedded into objects, however this approach poses serious challenges both for environmental sustainability and a future of truly ubiquitous interaction systems where information is encoded into everyday objects. In this work, we present Biodegradable Interactive Materials: backyard-compostable interactive interfaces that leverage information encoded in material properties. Inspired by natural systems, we propose an architecture that programmatically encodes multidimensional information into materials themselves and combines them with wearable devices that extend human senses to perceive the embedded data. We combine unrefined biological matter from plants and algae like chlorella with natural minerals like graphite and magnetite to produce materials with varying electrical, magnetic, and surface properties. We perform in-depth analysis using physics models, computational simulations, and real-world experiments to characterize their information density and develop decoding methods. Our passive, chip-less materials can robustly encode 12 bits of information, equivalent to 4096 unique classes. We further develop wearable device prototypes that can decode this information during touch interactions using off-the-shelf sensors. We demonstrate sample applications such as customized buttons, tactile maps, and interactive surfaces. We further demonstrate the natural degradation of these interactive materials in degrade outdoors within 21 days and perform a comparative environmental analysis of the benefits of this approach. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2404.01904 [pdf, ps, other]

Construction of quantum codes from $(γ,Δ)$-cyclic codes

Authors: Om Prakash, Shikha Patel, Habibul Islam

Abstract: Let $\mathbb{F}_q$ be the finite field of $q=p^m$ elements where $p$ is a prime and $m$ is a positive integer. This paper considers $(γ,Δ)$-cyclic codes over a class of finite commutative non-chain rings $\mathscr{R}_{q,s}=\mathbb{F}_q[v_1,v_2,\dots,v_s]/\langle v_i-v_i^2,v_iv_j=v_jv_i=0\rangle$ where $γ$ is an automorphism of $\mathscr{R}_{q,s}$, $Δ$ is a $γ$-derivation of $\mathscr{R}_{q,s}$ and… ▽ More Let $\mathbb{F}_q$ be the finite field of $q=p^m$ elements where $p$ is a prime and $m$ is a positive integer. This paper considers $(γ,Δ)$-cyclic codes over a class of finite commutative non-chain rings $\mathscr{R}_{q,s}=\mathbb{F}_q[v_1,v_2,\dots,v_s]/\langle v_i-v_i^2,v_iv_j=v_jv_i=0\rangle$ where $γ$ is an automorphism of $\mathscr{R}_{q,s}$, $Δ$ is a $γ$-derivation of $\mathscr{R}_{q,s}$ and $1\leq i\neq j\leq s$ for a positive integer $s$. Here, we show that a $(γ,Δ)$-cyclic code of length $n$ over $\mathscr{R}_{q,s}$ is the direct sum of $(θ,\Im)$-cyclic codes of length $n$ over $\mathbb{F}_q$, where $θ$ is an automorphism of $\mathbb{F}_q$ and $\Im$ is a $θ$-derivation of $\mathbb{F}_q$. Further, necessary and sufficient conditions for both $(γ,Δ)$-cyclic and $(θ,\Im)$-cyclic codes to contain their Euclidean duals are established. Finally, we obtain many quantum codes by applying the dual containing criterion on the Gray images of these codes. The obtained codes have better parameters than those available in the literature. △ Less

Submitted 2 April, 2024; originally announced April 2024.

MSC Class: 12Y05; 16Z05; 94B05; 94B35; 94B15

arXiv:2403.09810 [pdf, other]

doi 10.1145/3613904.3642089

LabelAId: Just-in-time AI Interventions for Improving Human Labeling Quality and Domain Knowledge in Crowdsourcing Systems

Authors: Chu Li, Zhihan Zhang, Michael Saugstad, Esteban Safranchik, Minchu Kulkarni, Xiaoyu Huang, Shwetak Patel, Vikram Iyer, Tim Althoff, Jon E. Froehlich

Abstract: Crowdsourcing platforms have transformed distributed problem-solving, yet quality control remains a persistent challenge. Traditional quality control measures, such as prescreening workers and refining instructions, often focus solely on optimizing economic output. This paper explores just-in-time AI interventions to enhance both labeling quality and domain-specific knowledge among crowdworkers. W… ▽ More Crowdsourcing platforms have transformed distributed problem-solving, yet quality control remains a persistent challenge. Traditional quality control measures, such as prescreening workers and refining instructions, often focus solely on optimizing economic output. This paper explores just-in-time AI interventions to enhance both labeling quality and domain-specific knowledge among crowdworkers. We introduce LabelAId, an advanced inference model combining Programmatic Weak Supervision (PWS) with FT-Transformers to infer label correctness based on user behavior and domain knowledge. Our technical evaluation shows that our LabelAId pipeline consistently outperforms state-of-the-art ML baselines, improving mistake inference accuracy by 36.7% with 50 downstream samples. We then implemented LabelAId into Project Sidewalk, an open-source crowdsourcing platform for urban accessibility. A between-subjects study with 34 participants demonstrates that LabelAId significantly enhances label precision without compromising efficiency while also increasing labeler confidence. We discuss LabelAId's success factors, limitations, and its generalizability to other crowdsourced science domains. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.07410 [pdf, ps, other]

Polylog-Competitive Deterministic Local Routing and Scheduling

Authors: Bernhard Haeupler, Shyamal Patel, Antti Roeyskoe, Cliff Stein, Goran Zuzic

Abstract: This paper addresses point-to-point packet routing in undirected networks, which is the most important communication primitive in most networks. The main result proves the existence of routing tables that guarantee a polylog-competitive completion-time $\textbf{deterministically}$: in any undirected network, it is possible to give each node simple stateless deterministic local forwarding rules, su… ▽ More This paper addresses point-to-point packet routing in undirected networks, which is the most important communication primitive in most networks. The main result proves the existence of routing tables that guarantee a polylog-competitive completion-time $\textbf{deterministically}$: in any undirected network, it is possible to give each node simple stateless deterministic local forwarding rules, such that, any adversarially chosen set of packets are delivered as fast as possible, up to polylog factors. All previous routing strategies crucially required randomization for both route selection and packet scheduling. The core technical contribution of this paper is a new local packet scheduling result of independent interest. This scheduling strategy integrates well with recent sparse semi-oblivious path selection strategies. Such strategies deterministically select not one but several candidate paths for each packet and require a global coordinator to select a single good path from those candidates for each packet. Another challenge is that, even if a single path is selected for each packet, no strategy for scheduling packets along low-congestion paths that is both local and deterministic is known. Our novel scheduling strategy utilizes the fact that every semi-oblivious routing strategy uses only a small (polynomial) subset of candidate routes. It overcomes the issue of global coordination by furthermore being provably robust to adversarial noise. This avoids the issue of having to choose a single path per packet because congestion caused by ineffective candidate paths can be treated as noise. Our results imply the first deterministic universally-optimal algorithms in the distributed supported-CONGEST model for many important global distributed tasks, including computing minimum spanning trees, approximate shortest paths, and part-wise aggregates. △ Less

Submitted 13 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: To appear at STOC 2024

arXiv:2403.02522 [pdf, other]

HeAR -- Health Acoustic Representations

Authors: Sebastien Baur, Zaid Nabulsi, Wei-Hung Weng, Jake Garrison, Louis Blankemeier, Sam Fishman, Christina Chen, Sujay Kakarmath, Minyoi Maimbolwa, Nsala Sanjase, Brian Shuma, Yossi Matias, Greg S. Corrado, Shwetak Patel, Shravya Shetty, Shruthi Prabhakara, Monde Muyoyeta, Diego Ardila

Abstract: Health acoustic sounds such as coughs and breaths are known to contain useful health signals with significant potential for monitoring health and disease, yet are underexplored in the medical machine learning community. The existing deep learning systems for health acoustics are often narrowly trained and evaluated on a single task, which is limited by data and may hinder generalization to other t… ▽ More Health acoustic sounds such as coughs and breaths are known to contain useful health signals with significant potential for monitoring health and disease, yet are underexplored in the medical machine learning community. The existing deep learning systems for health acoustics are often narrowly trained and evaluated on a single task, which is limited by data and may hinder generalization to other tasks. To mitigate these gaps, we develop HeAR, a scalable self-supervised learning-based deep learning system using masked autoencoders trained on a large dataset of 313 million two-second long audio clips. Through linear probes, we establish HeAR as a state-of-the-art health audio embedding model on a benchmark of 33 health acoustic tasks across 6 datasets. By introducing this work, we hope to enable and accelerate further health acoustics research. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 4 tables, 4 figures, 6 supplementary tables, 3 supplementary figures

arXiv:2402.14815 [pdf]

Demographic Bias of Expert-Level Vision-Language Foundation Models in Medical Imaging

Authors: Yuzhe Yang, Yujia Liu, Xin Liu, Avanti Gulhane, Domenico Mastrodicasa, Wei Wu, Edward J Wang, Dushyant W Sahani, Shwetak Patel

Abstract: Advances in artificial intelligence (AI) have achieved expert-level performance in medical imaging applications. Notably, self-supervised vision-language foundation models can detect a broad spectrum of pathologies without relying on explicit training annotations. However, it is crucial to ensure that these AI models do not mirror or amplify human biases, thereby disadvantaging historically margin… ▽ More Advances in artificial intelligence (AI) have achieved expert-level performance in medical imaging applications. Notably, self-supervised vision-language foundation models can detect a broad spectrum of pathologies without relying on explicit training annotations. However, it is crucial to ensure that these AI models do not mirror or amplify human biases, thereby disadvantaging historically marginalized groups such as females or Black patients. The manifestation of such biases could systematically delay essential medical care for certain patient subgroups. In this study, we investigate the algorithmic fairness of state-of-the-art vision-language foundation models in chest X-ray diagnosis across five globally-sourced datasets. Our findings reveal that compared to board-certified radiologists, these foundation models consistently underdiagnose marginalized groups, with even higher rates seen in intersectional subgroups, such as Black female patients. Such demographic biases present over a wide range of pathologies and demographic attributes. Further analysis of the model embedding uncovers its significant encoding of demographic information. Deploying AI systems with these biases in medical imaging can intensify pre-existing care disparities, posing potential challenges to equitable healthcare access and raising ethical questions about their clinical application. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: Code and data are available at https://github.com/YyzHarry/vlm-fairness

arXiv:2402.08832 [pdf, other]

Intelligent Agricultural Management Considering N$_2$O Emission and Climate Variability with Uncertainties

Authors: Zhaoan Wang, Shaoping Xiao, Jun Wang, Ashwin Parab, Shivam Patel

Abstract: This study examines how artificial intelligence (AI), especially Reinforcement Learning (RL), can be used in farming to boost crop yields, fine-tune nitrogen use and watering, and reduce nitrate runoff and greenhouse gases, focusing on Nitrous Oxide (N$_2$O) emissions from soil. Facing climate change and limited agricultural knowledge, we use Partially Observable Markov Decision Processes (POMDPs)… ▽ More This study examines how artificial intelligence (AI), especially Reinforcement Learning (RL), can be used in farming to boost crop yields, fine-tune nitrogen use and watering, and reduce nitrate runoff and greenhouse gases, focusing on Nitrous Oxide (N$_2$O) emissions from soil. Facing climate change and limited agricultural knowledge, we use Partially Observable Markov Decision Processes (POMDPs) with a crop simulator to model AI agents' interactions with farming environments. We apply deep Q-learning with Recurrent Neural Network (RNN)-based Q networks for training agents on optimal actions. Also, we develop Machine Learning (ML) models to predict N$_2$O emissions, integrating these predictions into the simulator. Our research tackles uncertainties in N$_2$O emission estimates with a probabilistic ML approach and climate variability through a stochastic weather model, offering a range of emission outcomes to improve forecast reliability and decision-making. By incorporating climate change effects, we enhance agents' climate adaptability, aiming for resilient agricultural practices. Results show these agents can align crop productivity with environmental concerns by penalizing N$_2$O emissions, adapting effectively to climate shifts like warmer temperatures and less rain. This strategy improves farm management under climate change, highlighting AI's role in sustainable agriculture. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.00076 [pdf, ps, other]

Exploitation Strategies in Conditional Markov Chain Search: A case study on the three-index assignment problem

Authors: Sahil Patel, Daniel Karapetyan

Abstract: The Conditional Markov Chain Search (CMCS) is a framework for automated design of metaheuristics for discrete combinatorial optimisation problems. Given a set of algorithmic components such as hill climbers and mutations, CMCS decides in which order to apply those components. The decisions are dictated by the CMCS configuration that can be learnt offline. CMCS does not have an acceptance criterion… ▽ More The Conditional Markov Chain Search (CMCS) is a framework for automated design of metaheuristics for discrete combinatorial optimisation problems. Given a set of algorithmic components such as hill climbers and mutations, CMCS decides in which order to apply those components. The decisions are dictated by the CMCS configuration that can be learnt offline. CMCS does not have an acceptance criterion; any moves are accepted by the framework. As a result, it is particularly good in exploration but is not as good at exploitation. In this study, we explore several extensions of the framework to improve its exploitation abilities. To perform a computational study, we applied the framework to the three-index assignment problem. The results of our experiments showed that a two-stage CMCS is indeed superior to a single-stage CMCS. △ Less

Submitted 30 January, 2024; originally announced February 2024.

Comments: 14 pages

arXiv:2401.07889 [pdf]

Machine Learning Techniques to Identify Hand Gestures amidst Forearm Muscle Signals

Authors: Ryan Cho, Sunil Patel, Kyu Taek Cho, Jaejin Hwang

Abstract: This study investigated the use of forearm EMG data for distinguishing eight hand gestures, employing the Neural Network and Random Forest algorithms on data from ten participants. The Neural Network achieved 97 percent accuracy with 1000-millisecond windows, while the Random Forest achieved 85 percent accuracy with 200-millisecond windows. Larger window sizes improved gesture classification due t… ▽ More This study investigated the use of forearm EMG data for distinguishing eight hand gestures, employing the Neural Network and Random Forest algorithms on data from ten participants. The Neural Network achieved 97 percent accuracy with 1000-millisecond windows, while the Random Forest achieved 85 percent accuracy with 200-millisecond windows. Larger window sizes improved gesture classification due to increased temporal resolution. The Random Forest exhibited faster processing at 92 milliseconds, compared to the Neural Network's 124 milliseconds. In conclusion, the study identified a Neural Network with a 1000-millisecond stream as the most accurate (97 percent), and a Random Forest with a 200-millisecond stream as the most efficient (85 percent). Future research should focus on increasing sample size, incorporating more hand gestures, and exploring different feature extraction methods and modeling algorithms to enhance system accuracy and efficiency. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: 21 pages, 7 figures

arXiv:2312.09888 [pdf, other]

doi 10.1145/3624062.3624159

Scaling Computational Fluid Dynamics: In Situ Visualization of NekRS using SENSEI

Authors: Victor A. Mateevitsi, Mathis Bode, Nicola Ferrier, Paul Fischer, Jens Henrik Göbbert, Joseph A. Insley, Yu-Hsiang Lan, Misun Min, Michael E. Papka, Saumil Patel, Silvio Rizzi, Jonathan Windgassen

Abstract: In the realm of Computational Fluid Dynamics (CFD), the demand for memory and computation resources is extreme, necessitating the use of leadership-scale computing platforms for practical domain sizes. This intensive requirement renders traditional checkpointing methods ineffective due to the significant slowdown in simulations while saving state data to disk. As we progress towards exascale and G… ▽ More In the realm of Computational Fluid Dynamics (CFD), the demand for memory and computation resources is extreme, necessitating the use of leadership-scale computing platforms for practical domain sizes. This intensive requirement renders traditional checkpointing methods ineffective due to the significant slowdown in simulations while saving state data to disk. As we progress towards exascale and GPU-driven High-Performance Computing (HPC) and confront larger problem sizes, the choice becomes increasingly stark: to compromise data fidelity or to reduce resolution. To navigate this challenge, this study advocates for the use of in situ analysis and visualization techniques. These allow more frequent data "snapshots" to be taken directly from memory, thus avoiding the need for disruptive checkpointing. We detail our approach of instrumenting NekRS, a GPU-focused thermal-fluid simulation code employing the spectral element method (SEM), and describe varied in situ and in transit strategies for data rendering. Additionally, we provide concrete scientific use-cases and report on runs performed on Polaris, Argonne Leadership Computing Facility's (ALCF) 44 Petaflop supercomputer and Jülich Wizard for European Leadership Science (JUWELS) Booster, Jülich Supercomputing Centre's (JSC) 71 Petaflop High Performance Computing (HPC) system, offering practical insight into the implications of our methodology. △ Less

Submitted 18 December, 2023; v1 submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.07627 [pdf]

Multimodal Sentiment Analysis: Perceived vs Induced Sentiments

Authors: Aditi Aggarwal, Deepika Varshney, Saurabh Patel

Abstract: Social media has created a global network where people can easily access and exchange vast information. This information gives rise to a variety of opinions, reflecting both positive and negative viewpoints. GIFs stand out as a multimedia format offering a visually engaging way for users to communicate. In this research, we propose a multimodal framework that integrates visual and textual features… ▽ More Social media has created a global network where people can easily access and exchange vast information. This information gives rise to a variety of opinions, reflecting both positive and negative viewpoints. GIFs stand out as a multimedia format offering a visually engaging way for users to communicate. In this research, we propose a multimodal framework that integrates visual and textual features to predict the GIF sentiment. It also incorporates attributes including face emotion detection and OCR generated captions to capture the semantic aspects of the GIF. The developed classifier achieves an accuracy of 82.7% on Twitter GIFs, which is an improvement over state-of-the-art models. Moreover, we have based our research on the ReactionGIF dataset, analysing the variance in sentiment perceived by the author and sentiment induced in the reader △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.06652 [pdf, other]

Building Domain-Specific LLMs Faithful To The Islamic Worldview: Mirage or Technical Possibility?

Authors: Shabaz Patel, Hassan Kane, Rayhan Patel

Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across numerous natural language understanding use cases. However, this impressive performance comes with inherent limitations, such as the tendency to perpetuate stereotypical biases or fabricate non-existent facts. In the context of Islam and its representation, accurate and factual representation of its beliefs and teachings… ▽ More Large Language Models (LLMs) have demonstrated remarkable performance across numerous natural language understanding use cases. However, this impressive performance comes with inherent limitations, such as the tendency to perpetuate stereotypical biases or fabricate non-existent facts. In the context of Islam and its representation, accurate and factual representation of its beliefs and teachings rooted in the Quran and Sunnah is key. This work focuses on the challenge of building domain-specific LLMs faithful to the Islamic worldview and proposes ways to build and evaluate such systems. Firstly, we define this open-ended goal as a technical problem and propose various solutions. Subsequently, we critically examine known challenges inherent to each approach and highlight evaluation methodologies that can be used to assess such systems. This work highlights the need for high-quality datasets, evaluations, and interdisciplinary work blending machine learning with Islamic scholarship. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: Accepted for Muslims in ML workshop at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2312.03259 [pdf, other]

f-FERM: A Scalable Framework for Robust Fair Empirical Risk Minimization

Authors: Sina Baharlouei, Shivam Patel, Meisam Razaviyayn

Abstract: Training and deploying machine learning models that meet fairness criteria for protected groups are fundamental in modern artificial intelligence. While numerous constraints and regularization terms have been proposed in the literature to promote fairness in machine learning tasks, most of these methods are not amenable to stochastic optimization due to the complex and nonlinear structure of const… ▽ More Training and deploying machine learning models that meet fairness criteria for protected groups are fundamental in modern artificial intelligence. While numerous constraints and regularization terms have been proposed in the literature to promote fairness in machine learning tasks, most of these methods are not amenable to stochastic optimization due to the complex and nonlinear structure of constraints and regularizers. Here, the term "stochastic" refers to the ability of the algorithm to work with small mini-batches of data. Motivated by the limitation of existing literature, this paper presents a unified stochastic optimization framework for fair empirical risk minimization based on f-divergence measures (f-FERM). The proposed stochastic algorithm enjoys theoretical convergence guarantees. In addition, our experiments demonstrate the superiority of fairness-accuracy tradeoffs offered by f-FERM for almost all batch sizes (ranging from full-batch to batch size of one). Moreover, we show that our framework can be extended to the case where there is a distribution shift from training to the test data. Our extension is based on a distributionally robust optimization reformulation of f-FERM objective under $L_p$ norms as uncertainty sets. Again, in this distributionally robust setting, f-FERM not only enjoys theoretical convergence guarantees but also outperforms other baselines in the literature in the tasks involving distribution shifts. An efficient stochastic implementation of $f$-FERM is publicly available. △ Less

Submitted 7 April, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: 24 Pages,5 figures

Journal ref: ICLR 2024

arXiv:2312.00164 [pdf, other]

Towards Accurate Differential Diagnosis with Large Language Models

Authors: Daniel McDuff, Mike Schaekermann, Tao Tu, Anil Palepu, Amy Wang, Jake Garrison, Karan Singhal, Yash Sharma, Shekoofeh Azizi, Kavita Kulkarni, Le Hou, Yong Cheng, Yun Liu, S Sara Mahdavi, Sushant Prakash, Anupam Pathak, Christopher Semturs, Shwetak Patel, Dale R Webster, Ewa Dominowska, Juraj Gottweis, Joelle Barral, Katherine Chou, Greg S Corrado, Yossi Matias , et al. (3 additional authors not shown)

Abstract: An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM op… ▽ More An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise. △ Less

Submitted 30 November, 2023; originally announced December 2023.

arXiv:2311.17133 [pdf, other]

Deployment of a Robust and Explainable Mortality Prediction Model: The COVID-19 Pandemic and Beyond

Authors: Jacob R. Epifano, Stephen Glass, Ravi P. Ramachandran, Sharad Patel, Aaron J. Masino, Ghulam Rasool

Abstract: This study investigated the performance, explainability, and robustness of deployed artificial intelligence (AI) models in predicting mortality during the COVID-19 pandemic and beyond. The first study of its kind, we found that Bayesian Neural Networks (BNNs) and intelligent training techniques allowed our models to maintain performance amidst significant data shifts. Our results emphasize the imp… ▽ More This study investigated the performance, explainability, and robustness of deployed artificial intelligence (AI) models in predicting mortality during the COVID-19 pandemic and beyond. The first study of its kind, we found that Bayesian Neural Networks (BNNs) and intelligent training techniques allowed our models to maintain performance amidst significant data shifts. Our results emphasize the importance of developing robust AI models capable of matching or surpassing clinician predictions, even under challenging conditions. Our exploration of model explainability revealed that stochastic models generate more diverse and personalized explanations thereby highlighting the need for AI models that provide detailed and individualized insights in real-world clinical settings. Furthermore, we underscored the importance of quantifying uncertainty in AI models which enables clinicians to make better-informed decisions based on reliable predictions. Our study advocates for prioritizing implementation science in AI research for healthcare and ensuring that AI solutions are practical, beneficial, and sustainable in real-world clinical environments. By addressing unique challenges and complexities in healthcare settings, researchers can develop AI models that effectively improve clinical practice and patient outcomes. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.16213 [pdf, other]

Seeing Beyond Cancer: Multi-Institutional Validation of Object Localization and 3D Semantic Segmentation using Deep Learning for Breast MRI

Authors: Arda Pekis, Vignesh Kannan, Evandros Kaklamanos, Anu Antony, Snehal Patel, Tyler Earnest

Abstract: The clinical management of breast cancer depends on an accurate understanding of the tumor and its anatomical context to adjacent tissues and landmark structures. This context may be provided by semantic segmentation methods; however, previous works have been largely limited to a singular focus on the tumor alone and rarely other tissue types. In contrast, we present a method that exploits tissue-… ▽ More The clinical management of breast cancer depends on an accurate understanding of the tumor and its anatomical context to adjacent tissues and landmark structures. This context may be provided by semantic segmentation methods; however, previous works have been largely limited to a singular focus on the tumor alone and rarely other tissue types. In contrast, we present a method that exploits tissue-tissue interactions to accurately segment every major tissue type in the breast including: chest wall, skin, adipose tissue, fibroglandular tissue, vasculature and tumor via standard-of-care Dynamic Contrast Enhanced MRI. Comparing our method to prior state-of-the-art, we achieved a superior Dice score on tumor segmentation while maintaining competitive performance on other studied tissues across multiple institutions. Briefly, our method proceeds by localizing the tumor using 2D object detectors, then segmenting the tumor and surrounding tissues independently using two 3D U-nets, and finally integrating these results while mitigating false positives by checking for anatomically plausible tissue-tissue contacts. The object detection models were pre-trained on ImageNet and COCO, and operated on MIP (maximum intensity projection) images in the axial and sagittal planes, establishing a 3D tumor bounding box. By integrating multiple relevant peri-tumoral tissues, our work enables clinical applications in breast cancer staging, prognosis and surgical planning. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 9 pages, 2 figures, to appear in SPIE: Medical Imaging 2024

ACM Class: I.4.6; J.3

arXiv:2311.13063 [pdf, other]

doi 10.1145/3659604

From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models

Authors: Zachary Englhardt, Chengqian Ma, Margaret E. Morris, Xuhai "Orson" Xu, Chun-Cheng Chang, Lianhui Qin, Daniel McDuff, Xin Liu, Shwetak Patel, Vikram Iyer

Abstract: Passively collected behavioral health data from ubiquitous sensors holds significant promise to provide mental health professionals insights from patient's daily lives; however, developing analysis tools to use this data in clinical practice requires addressing challenges of generalization across devices and weak or ambiguous correlations between the measured signals and an individual's mental hea… ▽ More Passively collected behavioral health data from ubiquitous sensors holds significant promise to provide mental health professionals insights from patient's daily lives; however, developing analysis tools to use this data in clinical practice requires addressing challenges of generalization across devices and weak or ambiguous correlations between the measured signals and an individual's mental health. To address these challenges, we take a novel approach that leverages large language models (LLMs) to synthesize clinically useful insights from multi-sensor data. We develop chain of thought prompting methods that use LLMs to generate reasoning about how trends in data such as step count and sleep relate to conditions like depression and anxiety. We first demonstrate binary depression classification with LLMs achieving accuracies of 61.1% which exceed the state of the art. While it is not robust for clinical use, this leads us to our key finding: even more impactful and valued than classification is a new human-AI collaboration approach in which clinician experts interactively query these tools and combine their domain expertise and context about the patient with AI generated reasoning to support clinical decision-making. We find models like GPT-4 correctly reference numerical data 75% of the time, and clinician participants express strong interest in using this approach to interpret self-tracking data. △ Less

Submitted 25 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.09611 [pdf, other]

DeltaLCA: Comparative Life-Cycle Assessment for Electronics Design

Authors: Zhihan Zhang, Felix Hähnlein, Yuxuan Mei, Zachary Englhardt, Shwetak Patel, Adriana Schulz, Vikram Iyer

Abstract: Reducing the environmental footprint of electronics and computing devices requires new tools that empower designers to make informed decisions about sustainability during the design process itself. This is not possible with current tools for life cycle assessment (LCA) which require substantial domain expertise and time to evaluate the numerous chips and other components that make up a device. We… ▽ More Reducing the environmental footprint of electronics and computing devices requires new tools that empower designers to make informed decisions about sustainability during the design process itself. This is not possible with current tools for life cycle assessment (LCA) which require substantial domain expertise and time to evaluate the numerous chips and other components that make up a device. We observe first that informed decision-making does not require absolute metrics and can instead be done by comparing designs. Second, we can use domain-specific heuristics to perform these comparisons. We combine these insights to develop DeltaLCA, an open-source interactive design tool that addresses the dual challenges of automating life cycle inventory generation and data availability by performing comparative analyses of electronics designs. Users can upload standard design files from Electronic Design Automation (EDA) software and the tool will guide them through determining which one has greater carbon footprint. DeltaLCA leverages electronics-specific LCA datasets and heuristics and tries to automatically rank the two designs, prompting users to provide additional information only when necessary. We show through case studies DeltaLCA achieves the same result as evaluating full LCAs, and that it accelerates LCA comparisons from eight expert-hours to a single click for devices with ~30 components, and 15 minutes for more complex devices with ~100 components. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.05371 [pdf, other]

Training Robust Deep Physiological Measurement Models with Synthetic Video-based Data

Authors: Yuxuan Ou, Yuzhe Zhang, Yuntang Wang, Shwetak Patel, Daniel McDuf, Yuzhe Yang, Xin Liu

Abstract: Recent advances in supervised deep learning techniques have demonstrated the possibility to remotely measure human physiological vital signs (e.g., photoplethysmograph, heart rate) just from facial videos. However, the performance of these methods heavily relies on the availability and diversity of real labeled data. Yet, collecting large-scale real-world data with high-quality labels is typically… ▽ More Recent advances in supervised deep learning techniques have demonstrated the possibility to remotely measure human physiological vital signs (e.g., photoplethysmograph, heart rate) just from facial videos. However, the performance of these methods heavily relies on the availability and diversity of real labeled data. Yet, collecting large-scale real-world data with high-quality labels is typically challenging and resource intensive, which also raises privacy concerns when storing personal bio-metric data. Synthetic video-based datasets (e.g., SCAMPS \cite{mcduff2022scamps}) with photo-realistic synthesized avatars are introduced to alleviate the issues while providing high-quality synthetic data. However, there exists a significant gap between synthetic and real-world data, which hinders the generalization of neural models trained on these synthetic datasets. In this paper, we proposed several measures to add real-world noise to synthetic physiological signals and corresponding facial videos. We experimented with individual and combined augmentation methods and evaluated our framework on three public real-world datasets. Our results show that we were able to reduce the average MAE from 6.9 to 2.0. △ Less

Submitted 15 November, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.05198 [pdf, other]

doi 10.1109/CASE56687.2023.10260399

Indoor Localization for an Autonomous Model Car: A Marker-Based Multi-Sensor Fusion Framework

Authors: Xibo Li, Shruti Patel, David Stronzek-Pfeifer, Christof Büskens

Abstract: Global navigation satellite systems readily provide accurate position information when localizing a robot outdoors. However, an analogous standard solution does not exist yet for mobile robots operating indoors. This paper presents an integrated framework for indoor localization and experimental validation of an autonomous driving system based on an advanced driver-assistance system (ADAS) model c… ▽ More Global navigation satellite systems readily provide accurate position information when localizing a robot outdoors. However, an analogous standard solution does not exist yet for mobile robots operating indoors. This paper presents an integrated framework for indoor localization and experimental validation of an autonomous driving system based on an advanced driver-assistance system (ADAS) model car. The global pose of the model car is obtained by fusing information from fiducial markers, inertial sensors and wheel odometry. In order to achieve robust localization, we investigate and compare two extensions to the Extended Kalman Filter; first with adaptive noise tuning and second with Chi-squared test for measurement outlier detection. An efficient and low-cost ground truth measurement method using a single LiDAR sensor is also proposed to validate the results. The performance of the localization algorithms is tested on a complete autonomous driving system with trajectory planning and model predictive control. △ Less

Submitted 8 October, 2023; originally announced October 2023.

arXiv:2309.10160 [pdf, other]

RadOnc-GPT: A Large Language Model for Radiation Oncology

Authors: Zhengliang Liu, Peilong Wang, Yiwei Li, Jason Holmes, Peng Shu, Lian Zhang, Chenbin Liu, Ninghao Liu, Dajiang Zhu, Xiang Li, Quanzheng Li, Samir H. Patel, Terence T. Sio, Tianming Liu, Wei Liu

Abstract: This paper presents RadOnc-GPT, a large language model specialized for radiation oncology through advanced tuning methods. RadOnc-GPT was finetuned on a large dataset of radiation oncology patient records from the Mayo Clinic in Arizona. The model employs instruction tuning on three key tasks - generating radiotherapy treatment regimens, determining optimal radiation modalities, and providing diag… ▽ More This paper presents RadOnc-GPT, a large language model specialized for radiation oncology through advanced tuning methods. RadOnc-GPT was finetuned on a large dataset of radiation oncology patient records from the Mayo Clinic in Arizona. The model employs instruction tuning on three key tasks - generating radiotherapy treatment regimens, determining optimal radiation modalities, and providing diagnostic descriptions/ICD codes based on patient diagnostic details. Evaluations conducted by comparing RadOnc-GPT outputs to general large language model outputs showed higher ROUGE scores in these three tasks. The study demonstrated the potential of using large language models fine-tuned using domain-specific knowledge like RadOnc-GPT to achieve transformational capabilities in highly specialized healthcare fields such as radiation oncology. However, our model's clinical relevance requires confirmation, and it specializes in only the aforementioned three specific tasks and lacks broader applicability. Furthermore, its evaluation through ROUGE scores might not reflect the true semantic and clinical accuracy - challenges we intend to address in future research. △ Less

Submitted 5 November, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

arXiv:2308.14583 [pdf, other]

Learning to Read Analog Gauges from Synthetic Data

Authors: Juan Leon-Alcazar, Yazeed Alnumay, Cheng Zheng, Hassane Trigui, Sahejad Patel, Bernard Ghanem

Abstract: Manually reading and logging gauge data is time inefficient, and the effort increases according to the number of gauges available. We present a computer vision pipeline that automates the reading of analog gauges. We propose a two-stage CNN pipeline that identifies the key structural components of an analog gauge and outputs an angular reading. To facilitate the training of our approach, a synthet… ▽ More Manually reading and logging gauge data is time inefficient, and the effort increases according to the number of gauges available. We present a computer vision pipeline that automates the reading of analog gauges. We propose a two-stage CNN pipeline that identifies the key structural components of an analog gauge and outputs an angular reading. To facilitate the training of our approach, a synthetic dataset is generated thus obtaining a set of realistic analog gauges with their corresponding annotation. To validate our proposal, an additional real-world dataset was collected with 4.813 manually curated images. When compared against state-of-the-art methodologies, our method shows a significant improvement of 4.55 in the average error, which is a 52% relative improvement. The resources for this project will be made available at: https://github.com/fuankarion/automatic-gauge-reading. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Journal ref: Winter Conference on Applications of Computer Vision 2024

arXiv:2308.14089 [pdf, other]

MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records

Authors: Scott L. Fleming, Alejandro Lozano, William J. Haberkorn, Jenelle A. Jindal, Eduardo P. Reis, Rahul Thapa, Louis Blankemeier, Julian Z. Genkins, Ethan Steinberg, Ashwin Nayak, Birju S. Patel, Chia-Chun Chiang, Alison Callahan, Zepeng Huo, Sergios Gatidis, Scott J. Adams, Oluseyi Fayanju, Shreya J. Shah, Thomas Savage, Ethan Goh, Akshay S. Chaudhari, Nima Aghaeepour, Christopher Sharp, Michael A. Pfeffer, Percy Liang , et al. (5 additional authors not shown)

Abstract: The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture… ▽ More The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture the complexity of information needs and documentation burdens experienced by clinicians. To address these challenges, we introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data. MedAlign is curated by 15 clinicians (7 specialities), includes clinician-written reference responses for 303 instructions, and provides 276 longitudinal EHRs for grounding instruction-response pairs. We used MedAlign to evaluate 6 general domain LLMs, having clinicians rank the accuracy and quality of each LLM response. We found high error rates, ranging from 35% (GPT-4) to 68% (MPT-7B-Instruct), and an 8.3% drop in accuracy moving from 32k to 2k context lengths for GPT-4. Finally, we report correlations between clinician rankings and automated natural language generation metrics as a way to rank LLMs without human review. We make MedAlign available under a research data use agreement to enable LLM evaluations on tasks aligned with clinician needs and preferences. △ Less

Submitted 24 December, 2023; v1 submitted 27 August, 2023; originally announced August 2023.

arXiv:2308.05950 [pdf, other]

Blockchain-Based Transferable Digital Rights of Land

Authors: Ras Dwivedi, Sumit Patel, Prof. Sandeep Shukla

Abstract: Land, being a scarce and valuable resource, is in high demand, especially in densely populated areas of older cities. Development authorities require land for infrastructure projects and other amenities, while landowners hold onto their land for both its usage and its financial value. Transferable Development Rights (TDRs) serve as a mechanism to separate the development rights associated with the… ▽ More Land, being a scarce and valuable resource, is in high demand, especially in densely populated areas of older cities. Development authorities require land for infrastructure projects and other amenities, while landowners hold onto their land for both its usage and its financial value. Transferable Development Rights (TDRs) serve as a mechanism to separate the development rights associated with the land from the physical land itself. Development authorities acquire the land by offering compensation in the form of TDRs, which hold monetary value. In this paper, we present the tokenization of development rights, focusing on the implementation in collaboration with a development authority. While there have been previous implementations of land tokenization, we believe our approach is the first to tokenize development rights specifically. Our implementation addresses practical challenges related to record-keeping, ground verification of land, and the unique identification of stakeholders. We ensure the accurate evaluation of development rights by incorporating publicly available circle rates, which consider the ground development of the land and its surrounding areas. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: 5 pages, Paper presented in https://easychair.org/cfp/ICSF2023

arXiv:2307.07529 [pdf, other]

Learning Multiple Coordinated Agents under Directed Acyclic Graph Constraints

Authors: Jaeyeon Jang, Diego Klabjan, Han Liu, Nital S. Patel, Xiuqi Li, Balakrishnan Ananthanarayanan, Husam Dauod, Tzung-Han Juang

Abstract: This paper proposes a novel multi-agent reinforcement learning (MARL) method to learn multiple coordinated agents under directed acyclic graph (DAG) constraints. Unlike existing MARL approaches, our method explicitly exploits the DAG structure between agents to achieve more effective learning performance. Theoretically, we propose a novel surrogate value function based on a MARL model with synthet… ▽ More This paper proposes a novel multi-agent reinforcement learning (MARL) method to learn multiple coordinated agents under directed acyclic graph (DAG) constraints. Unlike existing MARL approaches, our method explicitly exploits the DAG structure between agents to achieve more effective learning performance. Theoretically, we propose a novel surrogate value function based on a MARL model with synthetic rewards (MARLM-SR) and prove that it serves as a lower bound of the optimal value function. Computationally, we propose a practical training algorithm that exploits new notion of leader agent and reward generator and distributor agent to guide the decomposed follower agents to better explore the parameter space in environments with DAG constraints. Empirically, we exploit four DAG environments including a real-world scheduling for one of Intel's high volume packaging and test factory to benchmark our methods and show it outperforms the other non-DAG approaches. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.03817 [pdf, other]

Exploring and Characterizing Large Language Models For Embedded System Development and Debugging

Authors: Zachary Englhardt, Richard Li, Dilini Nissanka, Zhihan Zhang, Girish Narayanswamy, Joseph Breda, Xin Liu, Shwetak Patel, Vikram Iyer

Abstract: Large language models (LLMs) have shown remarkable abilities to generate code, however their ability to develop software for embedded systems, which requires cross-domain knowledge of hardware and software has not been studied. In this paper we develop an extensible, open source hardware-in-the-loop framework to systematically evaluate leading LLMs (GPT-3.5, GPT-4, PaLM 2) to assess their capabili… ▽ More Large language models (LLMs) have shown remarkable abilities to generate code, however their ability to develop software for embedded systems, which requires cross-domain knowledge of hardware and software has not been studied. In this paper we develop an extensible, open source hardware-in-the-loop framework to systematically evaluate leading LLMs (GPT-3.5, GPT-4, PaLM 2) to assess their capabilities and limitations for embedded system development. We observe through our study that even when these tools fail to produce working code, they consistently generate helpful reasoning about embedded design tasks. We leverage this finding to study how human programmers interact with these tools, and develop an human-AI based software engineering workflow for building embedded systems. Our evaluation platform for verifying LLM generated programs uses sensor actuator pairs for physical evaluation. We compare all three models with N=450 experiments and find surprisingly that GPT-4 especially shows an exceptional level of cross-domain understanding and reasoning, in some cases generating fully correct programs from a single prompt. In N=50 trials, GPT-4 produces functional I2C interfaces 66% of the time. GPT-4 also produces register-level drivers, code for LoRa communication, and context-specific power optimizations for an nRF52 program resulting in over 740x current reduction to 12.2uA. We also characterize the models' limitations to develop a generalizable human-AI workflow for using LLMs in embedded system development. We evaluate our workflow with 15 users including novice and expert programmers. We find that our workflow improves productivity for all users and increases the success rate for building a LoRa environmental sensor from 25% to 100%, including for users with zero hardware or C/C++ experience. △ Less

Submitted 21 November, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

arXiv:2306.03331 [pdf, ps, other]

A Robust Likelihood Model for Novelty Detection

Authors: Ranya Almohsen, Shivang Patel, Donald A. Adjeroh, Gianfranco Doretto

Abstract: Current approaches to novelty or anomaly detection are based on deep neural networks. Despite their effectiveness, neural networks are also vulnerable to imperceptible deformations of the input data. This is a serious issue in critical applications, or when data alterations are generated by an adversarial attack. While this is a known problem that has been studied in recent years for the case of s… ▽ More Current approaches to novelty or anomaly detection are based on deep neural networks. Despite their effectiveness, neural networks are also vulnerable to imperceptible deformations of the input data. This is a serious issue in critical applications, or when data alterations are generated by an adversarial attack. While this is a known problem that has been studied in recent years for the case of supervised learning, the case of novelty detection has received very limited attention. Indeed, in this latter setting the learning is typically unsupervised because outlier data is not available during training, and new approaches for this case need to be investigated. We propose a new prior that aims at learning a robust likelihood for the novelty test, as a defense against attacks. We also integrate the same prior with a state-of-the-art novelty detection approach. Because of the geometric properties of that approach, the resulting robust training is computationally very efficient. An initial evaluation of the method indicates that it is effective at improving performance with respect to the standard models in the absence and presence of attacks. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: CVPR Workshop on Computer Vision in the Wild, 2023

arXiv:2306.01938 [pdf, other]

Self-supervised Interest Point Detection and Description for Fisheye and Perspective Images

Authors: Marcela Mera-Trujillo, Shivang Patel, Yu Gu, Gianfranco Doretto

Abstract: Keypoint detection and matching is a fundamental task in many computer vision problems, from shape reconstruction, to structure from motion, to AR/VR applications and robotics. It is a well-studied problem with remarkable successes such as SIFT, and more recent deep learning approaches. While great robustness is exhibited by these techniques with respect to noise, illumination variation, and rigid… ▽ More Keypoint detection and matching is a fundamental task in many computer vision problems, from shape reconstruction, to structure from motion, to AR/VR applications and robotics. It is a well-studied problem with remarkable successes such as SIFT, and more recent deep learning approaches. While great robustness is exhibited by these techniques with respect to noise, illumination variation, and rigid motion transformations, less attention has been placed on image distortion sensitivity. In this work, we focus on the case when this is caused by the geometry of the cameras used for image acquisition, and consider the keypoint detection and matching problem between the hybrid scenario of a fisheye and a projective image. We build on a state-of-the-art approach and derive a self-supervised procedure that enables training an interest point detector and descriptor network. We also collected two new datasets for additional training and testing in this unexplored scenario, and we demonstrate that current approaches are suboptimal because they are designed to work in traditional projective conditions, while the proposed approach turns out to be the most effective. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: CVPR Workshop on Omnidirectional Computer Vision, 2023

arXiv:2305.15525 [pdf, other]

Large Language Models are Few-Shot Health Learners

Authors: Xin Liu, Daniel McDuff, Geza Kovacs, Isaac Galatzer-Levy, Jacob Sunshine, Jiening Zhan, Ming-Zher Poh, Shun Liao, Paolo Di Achille, Shwetak Patel

Abstract: Large language models (LLMs) can capture rich representations of concepts that are useful for real-world tasks. However, language alone is limited. While existing LLMs excel at text-based inferences, health applications require that models be grounded in numerical data (e.g., vital signs, laboratory values in clinical domains; steps, movement in the wellness domain) that is not easily or readily e… ▽ More Large language models (LLMs) can capture rich representations of concepts that are useful for real-world tasks. However, language alone is limited. While existing LLMs excel at text-based inferences, health applications require that models be grounded in numerical data (e.g., vital signs, laboratory values in clinical domains; steps, movement in the wellness domain) that is not easily or readily expressed as text in existing training corpus. We demonstrate that with only few-shot tuning, a large language model is capable of grounding various physiological and behavioral time-series data and making meaningful inferences on numerous health tasks for both clinical and wellness contexts. Using data from wearable and medical sensor recordings, we evaluate these capabilities on the tasks of cardiac signal analysis, physical activity recognition, metabolic calculation (e.g., calories burned), and estimation of stress reports and mental health screeners. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.10404 [pdf, ps, other]

$\mathbb{F}_q\mathcal{R}$-skew cyclic codes and their application to quantum codes

Authors: Om Prakash, Shikha Patel, Habibul Islam

Abstract: Let $p$ be a prime and $\mathbb{F}_q$ be the finite field of order $q=p^m$. In this paper, we study $\mathbb{F}_q\mathcal{R}$-skew cyclic codes where $\mathcal{R}=\mathbb{F}_q+u\mathbb{F}_q$ with $u^2=u$. To characterize $\mathbb{F}_q\mathcal{R}$-skew cyclic codes, we first establish their algebraic structure and then discuss the dual-containing properties by considering a non-degenerate inner pro… ▽ More Let $p$ be a prime and $\mathbb{F}_q$ be the finite field of order $q=p^m$. In this paper, we study $\mathbb{F}_q\mathcal{R}$-skew cyclic codes where $\mathcal{R}=\mathbb{F}_q+u\mathbb{F}_q$ with $u^2=u$. To characterize $\mathbb{F}_q\mathcal{R}$-skew cyclic codes, we first establish their algebraic structure and then discuss the dual-containing properties by considering a non-degenerate inner product. Further, we define a Gray map over $\mathbb{F}_q\mathcal{R}$ and obtain their $\mathbb{F}_q$-Gray images. As an application, we apply the CSS (Calderbank-Shor-Steane) construction on Gray images of dual containing $\mathbb{F}_q\mathcal{R}$-skew cyclic codes and obtain many quantum codes with better parameters than the best-known codes available in the literature. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: 17 pages. This paper is under modification in the Quantum Information Processing

MSC Class: 11T71; 11T06; 94B05; 94B15

arXiv:2305.06161 [pdf, other]

StarCoder: may the source be with you!

Authors: Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu , et al. (42 additional authors not shown)

Abstract: The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle… ▽ More The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license. △ Less

Submitted 13 December, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

Showing 1–50 of 180 results for author: Patel, S