Search | arXiv e-print repository

doi 10.1145/3626203.3670536

Benchmarking with Supernovae: A Performance Study of the FLASH Code

Authors: Joshua Martin, Catherine Feldman, Eva Siegmann, Tony Curtis, David Carlson, Firat Coskun, Daniel Wood, Raul Gonzalez, Robert J. Harrison, Alan C. Calder

Abstract: Astrophysical simulations are computation, memory, and thus energy intensive, thereby requiring new hardware advances for progress. Stony Brook University recently expanded its computing cluster "SeaWulf" with an addition of 94 new nodes featuring Intel Sapphire Rapids Xeon Max series CPUs. We present a performance and power efficiency study of this hardware performed with FLASH: a multi-scale, mu… ▽ More Astrophysical simulations are computation, memory, and thus energy intensive, thereby requiring new hardware advances for progress. Stony Brook University recently expanded its computing cluster "SeaWulf" with an addition of 94 new nodes featuring Intel Sapphire Rapids Xeon Max series CPUs. We present a performance and power efficiency study of this hardware performed with FLASH: a multi-scale, multi-physics, adaptive mesh-based software instrument. We extend this study to compare performance to that of Stony Brook's Ookami testbed which features ARM-based A64FX-700 processors, and SeaWulf's AMD EPYC Milan and Intel Skylake nodes. Our application is a stellar explosion known as a thermonuclear (Type Ia) supernova and for this 3D problem, FLASH includes operators for hydrodynamics, gravity, and nuclear burning, in addition to routines for the material equation of state. We perform a strong-scaling study with a 220 GB problem size to explore both single- and multi-node performance. Our study explores the performance of different MPI mappings and the distribution of processors across nodes. From these tests, we determined the optimal configuration to balance runtime and energy consumption for our application. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: Accepted to PEARC '24 (Practice and Experience in Advanced Research Computing)

Journal ref: Practice and Experience in Advanced Research Computing 2024 Article no.8

arXiv:2408.07860 [pdf]

A Novel Generative Artificial Intelligence Method for Interference Study on Multiplex Brightfield Immunohistochemistry Images

Authors: Satarupa Mukherjee, Jim Martin, Yao Nie

Abstract: Multiplex brightfield imaging offers the advantage of simultaneously analyzing multiple biomarkers on a single slide, as opposed to single biomarker labeling on multiple consecutive slides. To accurately analyze multiple biomarkers localized at the same cellular compartment, two representative biomarker sets were selected as assay models - cMET-PDL1-EGFR and CD8-LAG3-PDL1, where all three biomarke… ▽ More Multiplex brightfield imaging offers the advantage of simultaneously analyzing multiple biomarkers on a single slide, as opposed to single biomarker labeling on multiple consecutive slides. To accurately analyze multiple biomarkers localized at the same cellular compartment, two representative biomarker sets were selected as assay models - cMET-PDL1-EGFR and CD8-LAG3-PDL1, where all three biomarkers can co-localize on the cell membrane. One of the most crucial preliminary stages for analyzing such assay is identifying each unique chromogen on individual cells. This is a challenging problem due to the co-localization of membrane stains from all the three biomarkers. It requires advanced color unmixing for creating the equivalent singleplex images from each triplex image for each biomarker. In this project, we developed a cycle-Generative Adversarial Network (cycle-GAN) method for unmixing the triplex images generated from the above-mentioned assays. Three different models were designed to generate the singleplex image for each of the three stains Tamra (purple), QM-Dabsyl (yellow) and Green. A notable novelty of our approach was that the input to the network were images in the optical density domain instead of conventionally used RGB images. The use of the optical density domain helped in reducing the blurriness of the synthetic singleplex images, which was often observed when the network was trained on RGB images. The cycle-GAN models were validated on 10,800 lung, gastric and colon images for the cMET-PDL1-EGFR assay and 3600 colon images for the CD8-LAG3-PDL1 assay. Visual as well as quantified assessments demonstrated that the proposed method is effective and efficient when compared with the manual reviewing results and is readily applicable to various multiplex assays. △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2407.13467 [pdf]

Personal Data Transfers to Non-EEA Domains: A Tool for Citizens and An Analysis on Italian Public Administration Websites

Authors: Lorenzo Laudadio, Antonio Vetrò, Riccardo Coppola, Juan Carlos De Martin, Marco Torchiano

Abstract: Six years after the entry into force of the GDPR, European companies and organizations still have difficulties complying with it: the amount of fines issued by the European data protection authorities is continuously increasing. Personal data transfers are no exception. In this work we analyse the personal data transfers from more than 20000 Italian Public Administration (PA) entities to third cou… ▽ More Six years after the entry into force of the GDPR, European companies and organizations still have difficulties complying with it: the amount of fines issued by the European data protection authorities is continuously increasing. Personal data transfers are no exception. In this work we analyse the personal data transfers from more than 20000 Italian Public Administration (PA) entities to third countries. We developed "Minos", a user-friendly application which allows to navigate the web while recording HTTP requests. Then, we used the back-end of Minos to automate the analysis. We found that about 14% of the PAs websites transferred data out of the European Economic Area (EEA). This number is an underestimation because only visits to the home pages were object of the analysis. The top 3 destinations of the data transfers are Amazon, Google and Fonticons, accounting for about the 70% of the bad requests. The most recurrent services which are the object of the requests are cloud computing services and content delivery networks (CDNs). Our results highlight that, in Italy, a relevant portion of public administrations websites transfers personal data to non EEA countries. In terms of technology policy, these results stress the need for further incentives to improve the PA digital infrastructures. Finally, while working on refinements of Minos, the version here described is openly available on Zenodo: it can be helpful to a variety of actors (citizens, researchers, activists, policy makers) to increase awareness and enlarge the investigation. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: International Conference on Information Technology for Social Good (GoodIT '24), September 4-6, 2024, Bremen, Germany

arXiv:2407.11988 [pdf, other]

Generating Harder Cross-document Event Coreference Resolution Datasets using Metaphoric Paraphrasing

Authors: Shafiuddin Rehan Ahmed, Zhiyong Eric Wang, George Arthur Baker, Kevin Stowe, James H. Martin

Abstract: The most popular Cross-Document Event Coreference Resolution (CDEC) datasets fail to convey the true difficulty of the task, due to the lack of lexical diversity between coreferring event triggers (words or phrases that refer to an event). Furthermore, there is a dearth of event datasets for figurative language, limiting a crucial avenue of research in event comprehension. We address these two iss… ▽ More The most popular Cross-Document Event Coreference Resolution (CDEC) datasets fail to convey the true difficulty of the task, due to the lack of lexical diversity between coreferring event triggers (words or phrases that refer to an event). Furthermore, there is a dearth of event datasets for figurative language, limiting a crucial avenue of research in event comprehension. We address these two issues by introducing ECB+META, a lexically rich variant of Event Coref Bank Plus (ECB+) for CDEC on symbolic and metaphoric language. We use ChatGPT as a tool for the metaphoric transformation of sentences in the documents of ECB+, then tag the original event triggers in the transformed sentences in a semi-automated manner. In this way, we avoid the re-annotation of expensive coreference links. We present results that show existing methods that work well on ECB+ struggle with ECB+META, thereby paving the way for CDEC research on a much more challenging dataset. Code/data: https://github.com/ahmeshaf/llms_coref △ Less

Submitted 5 June, 2024; originally announced July 2024.

Comments: Short Paper, ACL 2024

arXiv:2407.10554 [pdf, other]

Beyond Generative Artificial Intelligence: Roadmap for Natural Language Generation

Authors: María Miró Maestre, Iván Martínez-Murillo, Tania J. Martin, Borja Navarro-Colorado, Antonio Ferrández, Armando Suárez Cueto, Elena Lloret

Abstract: Generative Artificial Intelligence has grown exponentially as a result of Large Language Models (LLMs). This has been possible because of the impressive performance of deep learning methods created within the field of Natural Language Processing (NLP) and its subfield Natural Language Generation (NLG), which is the focus of this paper. Within the growing LLM family are the popular GPT-4, Bard and… ▽ More Generative Artificial Intelligence has grown exponentially as a result of Large Language Models (LLMs). This has been possible because of the impressive performance of deep learning methods created within the field of Natural Language Processing (NLP) and its subfield Natural Language Generation (NLG), which is the focus of this paper. Within the growing LLM family are the popular GPT-4, Bard and more specifically, tools such as ChatGPT have become a benchmark for other LLMs when solving most of the tasks involved in NLG research. This scenario poses new questions about the next steps for NLG and how the field can adapt and evolve to deal with new challenges in the era of LLMs. To address this, the present paper conducts a review of a representative sample of surveys recently published in NLG. By doing so, we aim to provide the scientific community with a research roadmap to identify which NLG aspects are still not suitably addressed by LLMs, as well as suggest future lines of research that should be addressed going forward. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2406.19561 [pdf, other]

Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning

Authors: Bradley Burega, John D. Martin, Luke Kapeluck, Michael Bowling

Abstract: We study how a Reinforcement Learning (RL) system can remain sample-efficient when learning from an imperfect model of the environment. This is particularly challenging when the learning system is resource-constrained and in continual settings, where the environment dynamics change. To address these challenges, our paper introduces an online, meta-gradient algorithm that tunes a probability with w… ▽ More We study how a Reinforcement Learning (RL) system can remain sample-efficient when learning from an imperfect model of the environment. This is particularly challenging when the learning system is resource-constrained and in continual settings, where the environment dynamics change. To address these challenges, our paper introduces an online, meta-gradient algorithm that tunes a probability with which states are queried during Dyna-style planning. Our study compares the aggregate, empirical performance of this meta-gradient method to baselines that employ conventional sampling strategies. Results indicate that our method improves efficiency of the planning process, which, as a consequence, improves the sample-efficiency of the overall learning process. On the whole, we observe that our meta-learned solutions avoid several pathologies of conventional planning approaches, such as sampling inaccurate transitions and those that stall credit assignment. We believe these findings could prove useful, in future work, for designing model-based RL systems at scale. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.11880 [pdf, other]

Knowledge Return Oriented Prompting (KROP)

Authors: Jason Martin, Kenneth Yeung

Abstract: Many Large Language Models (LLMs) and LLM-powered apps deployed today use some form of prompt filter or alignment to protect their integrity. However, these measures aren't foolproof. This paper introduces KROP, a prompt injection technique capable of obfuscating prompt injection attacks, rendering them virtually undetectable to most of these security measures. Many Large Language Models (LLMs) and LLM-powered apps deployed today use some form of prompt filter or alignment to protect their integrity. However, these measures aren't foolproof. This paper introduces KROP, a prompt injection technique capable of obfuscating prompt injection attacks, rendering them virtually undetectable to most of these security measures. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.11036 [pdf, other]

garak: A Framework for Security Probing Large Language Models

Authors: Leon Derczynski, Erick Galinkin, Jeffrey Martin, Subho Majumdar, Nanna Inie

Abstract: As Large Language Models (LLMs) are deployed and integrated into thousands of applications, the need for scalable evaluation of how models respond to adversarial attacks grows rapidly. However, LLM security is a moving target: models produce unpredictable output, are constantly updated, and the potential adversary is highly diverse: anyone with access to the internet and a decent command of natura… ▽ More As Large Language Models (LLMs) are deployed and integrated into thousands of applications, the need for scalable evaluation of how models respond to adversarial attacks grows rapidly. However, LLM security is a moving target: models produce unpredictable output, are constantly updated, and the potential adversary is highly diverse: anyone with access to the internet and a decent command of natural language. Further, what constitutes a security weak in one context may not be an issue in a different context; one-fits-all guardrails remain theoretical. In this paper, we argue that it is time to rethink what constitutes ``LLM security'', and pursue a holistic approach to LLM security evaluation, where exploration and discovery of issues are central. To this end, this paper introduces garak (Generative AI Red-teaming and Assessment Kit), a framework which can be used to discover and identify vulnerabilities in a target LLM or dialog system. garak probes an LLM in a structured fashion to discover potential vulnerabilities. The outputs of the framework describe a target model's weaknesses, contribute to an informed discussion of what composes vulnerabilities in unique contexts, and can inform alignment and policy discussions for LLM deployment. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: https://garak.ai

arXiv:2406.00225 [pdf, other]

Kinematic Model of Magnetic Domain Wall Motion for Fast, High-Accuracy Simulations

Authors: Kristi Doleh, Leonard Humphrey, Chandler M. Linseisen, Michael D. Kitcher, Joanna M. Martin, Can Cui, Jean Anne C. Incorvia, Felipe Garcia-Sanchez, Naimul Hassan, Alexander J. Edwards, Joseph S. Friedman

Abstract: Domain wall (DW) devices have garnered recent interest for diverse applications including memory, logic, and neuromorphic primitives; fast, accurate device models are therefore imperative for large-scale system design and verification. Extant DW motion models are sub-optimal for large-scale system design either over-consuming compute resources with physics-heavy equations or oversimplifying the ph… ▽ More Domain wall (DW) devices have garnered recent interest for diverse applications including memory, logic, and neuromorphic primitives; fast, accurate device models are therefore imperative for large-scale system design and verification. Extant DW motion models are sub-optimal for large-scale system design either over-consuming compute resources with physics-heavy equations or oversimplifying the physics, drastically reducing model accuracy. We propose a DW model inspired by the phenomenological similarities between motions of a DW and a classical object being acted on by forces like air resistance or static friction. Our proposed phenomenological model predicts DW motion within 1.2% on average compared with micromagnetic simulations that are 400 times slower. Additionally our model is seven times faster than extant collective coordinate models and 14 times more accurate than extant hyper-reduced models making it an essential tool for large-scale DW circuit design and simulation. The model is publicly posted along with scripts that automatically extract model parameters from user-provided simulation or experimental data to extend the model to alternative micromagnetic parameters. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.09153 [pdf, other]

Adapting Abstract Meaning Representation Parsing to the Clinical Narrative -- the SPRING THYME parser

Authors: Jon Z. Cai, Kristin Wright-Bettner, Martha Palmer, Guergana K. Savova, James H. Martin

Abstract: This paper is dedicated to the design and evaluation of the first AMR parser tailored for clinical notes. Our objective was to facilitate the precise transformation of the clinical notes into structured AMR expressions, thereby enhancing the interpretability and usability of clinical text data at scale. Leveraging the colon cancer dataset from the Temporal Histories of Your Medical Events (THYME)… ▽ More This paper is dedicated to the design and evaluation of the first AMR parser tailored for clinical notes. Our objective was to facilitate the precise transformation of the clinical notes into structured AMR expressions, thereby enhancing the interpretability and usability of clinical text data at scale. Leveraging the colon cancer dataset from the Temporal Histories of Your Medical Events (THYME) corpus, we adapted a state-of-the-art AMR parser utilizing continuous training. Our approach incorporates data augmentation techniques to enhance the accuracy of AMR structure predictions. Notably, through this learning strategy, our parser achieved an impressive F1 score of 88% on the THYME corpus's colon cancer dataset. Moreover, our research delved into the efficacy of data required for domain adaptation within the realm of clinical notes, presenting domain adaptation data requirements for AMR parsing. This exploration not only underscores the parser's robust performance but also highlights its potential in facilitating a deeper understanding of clinical narratives through structured semantic representations. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Accepted to the 6th Clinical NLP Workshop at NAACL, 2024

arXiv:2404.17730 [pdf, other]

Bridging the Social & Technical Divide in Augmentative and Alternative Communication (AAC) Applications for Autistic Adults

Authors: Lara J. Martin, Malathy Nagalakshmi

Abstract: Natural Language Processing (NLP) techniques are being used more frequently to improve high-tech Augmentative and Alternative Communication (AAC), but many of these techniques are integrated without the inclusion of the users' perspectives. As many of these tools are created with children in mind, autistic adults are often neglected in the design of AAC tools to begin with. We conducted in-depth i… ▽ More Natural Language Processing (NLP) techniques are being used more frequently to improve high-tech Augmentative and Alternative Communication (AAC), but many of these techniques are integrated without the inclusion of the users' perspectives. As many of these tools are created with children in mind, autistic adults are often neglected in the design of AAC tools to begin with. We conducted in-depth interviews with 12 autistic adults to find the pain points of current AAC and determine what general technological advances they would find helpful. We found that in addition to technological issues, there are many societal issues as well. We found 9 different categories of themes from our interviews: input options, output options, selecting or adapting AAC for a good fit, when to start or swap AAC, benefits (of use), access (to AAC), stumbling blocks for continued use, social concerns, and lack of control. In this paper, we go through these nine categories in depth and then suggest possible guidelines for the NLP community, AAC application makers, and policy makers to improve AAC use for autistic adults. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.08949 [pdf, other]

Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles

Authors: Abhijnan Nath, Huma Jamil, Shafiuddin Rehan Ahmed, George Baker, Rahul Ghosh, James H. Martin, Nathaniel Blanchard, Nikhil Krishnaswamy

Abstract: Event coreference resolution (ECR) is the task of determining whether distinct mentions of events within a multi-document corpus are actually linked to the same underlying occurrence. Images of the events can help facilitate resolution when language is ambiguous. Here, we propose a multimodal cross-document event coreference resolution method that integrates visual and textual cues with a simple l… ▽ More Event coreference resolution (ECR) is the task of determining whether distinct mentions of events within a multi-document corpus are actually linked to the same underlying occurrence. Images of the events can help facilitate resolution when language is ambiguous. Here, we propose a multimodal cross-document event coreference resolution method that integrates visual and textual cues with a simple linear map between vision and language models. As existing ECR benchmark datasets rarely provide images for all event mentions, we augment the popular ECB+ dataset with event-centric images scraped from the internet and generated using image diffusion models. We establish three methods that incorporate images and text for coreference: 1) a standard fused model with finetuning, 2) a novel linear mapping method without finetuning and 3) an ensembling approach based on splitting mention pairs by semantic and discourse-level difficulty. We evaluate on 2 datasets: the augmented ECB+, and AIDA Phase 1. Our ensemble systems using cross-modal linear mapping establish an upper limit (91.9 CoNLL F1) on ECB+ ECR performance given the preprocessing assumptions used, and establish a novel baseline on AIDA Phase 1. Our results demonstrate the utility of multimodal information in ECR for certain challenging coreference problems, and highlight a need for more multimodal resources in the coreference resolution space. △ Less

Submitted 13 April, 2024; originally announced April 2024.

Comments: To appear at LREC-COLING 2024

arXiv:2404.08656 [pdf, other]

Linear Cross-document Event Coreference Resolution with X-AMR

Authors: Shafiuddin Rehan Ahmed, George Arthur Baker, Evi Judge, Michael Regan, Kristin Wright-Bettner, Martha Palmer, James H. Martin

Abstract: Event Coreference Resolution (ECR) as a pairwise mention classification task is expensive both for automated systems and manual annotations. The task's quadratic difficulty is exacerbated when using Large Language Models (LLMs), making prompt engineering for ECR prohibitively costly. In this work, we propose a graphical representation of events, X-AMR, anchored around individual mentions using a \… ▽ More Event Coreference Resolution (ECR) as a pairwise mention classification task is expensive both for automated systems and manual annotations. The task's quadratic difficulty is exacerbated when using Large Language Models (LLMs), making prompt engineering for ECR prohibitively costly. In this work, we propose a graphical representation of events, X-AMR, anchored around individual mentions using a \textbf{cross}-document version of \textbf{A}bstract \textbf{M}eaning \textbf{R}epresentation. We then linearize the ECR with a novel multi-hop coreference algorithm over the event graphs. The event graphs simplify ECR, making it a) LLM cost-effective, b) compositional and interpretable, and c) easily annotated. For a fair assessment, we first enrich an existing ECR benchmark dataset with these event graphs using an annotator-friendly tool we introduce. Then, we employ GPT-4, the newest LLM by OpenAI, for these annotations. Finally, using the ECR algorithm, we assess GPT-4 against humans and analyze its limitations. Through this research, we aim to advance the state-of-the-art for efficient ECR and shed light on the potential shortcomings of current LLMs at this task. Code and annotations: \url{https://github.com/ahmeshaf/gpt_coref} △ Less

Submitted 24 March, 2024; originally announced April 2024.

Comments: LREC-COLING 2024 main conference

arXiv:2403.15407 [pdf, other]

X-AMR Annotation Tool

Authors: Shafiuddin Rehan Ahmed, Jon Z. Cai, Martha Palmer, James H. Martin

Abstract: This paper presents a novel Cross-document Abstract Meaning Representation (X-AMR) annotation tool designed for annotating key corpus-level event semantics. Leveraging machine assistance through the Prodigy Annotation Tool, we enhance the user experience, ensuring ease and efficiency in the annotation process. Through empirical analyses, we demonstrate the effectiveness of our tool in augmenting a… ▽ More This paper presents a novel Cross-document Abstract Meaning Representation (X-AMR) annotation tool designed for annotating key corpus-level event semantics. Leveraging machine assistance through the Prodigy Annotation Tool, we enhance the user experience, ensuring ease and efficiency in the annotation process. Through empirical analyses, we demonstrate the effectiveness of our tool in augmenting an existing event corpus, highlighting its advantages when integrated with GPT-4. Code and annotations: https://github.com/ahmeshaf/gpt_coref △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: EACL 2024 System Demonstration

arXiv:2403.04761 [pdf, other]

doi 10.1145/3613904.3642001

DeepSee: Multidimensional Visualizations of Seabed Ecosystems

Authors: Adam Coscia, Haley M. Sapers, Noah Deutsch, Malika Khurana, John S. Magyar, Sergio A. Parra, Daniel R. Utter, Rebecca L. Wipfler, David W. Caress, Eric J. Martin, Jennifer B. Paduan, Maggie Hendrie, Santiago Lombeyda, Hillary Mushkin, Alex Endert, Scott Davidoff, Victoria J. Orphan

Abstract: Scientists studying deep ocean microbial ecosystems use limited numbers of sediment samples collected from the seafloor to characterize important life-sustaining biogeochemical cycles in the environment. Yet conducting fieldwork to sample these extreme remote environments is both expensive and time consuming, requiring tools that enable scientists to explore the sampling history of field sites and… ▽ More Scientists studying deep ocean microbial ecosystems use limited numbers of sediment samples collected from the seafloor to characterize important life-sustaining biogeochemical cycles in the environment. Yet conducting fieldwork to sample these extreme remote environments is both expensive and time consuming, requiring tools that enable scientists to explore the sampling history of field sites and predict where taking new samples is likely to maximize scientific return. We conducted a collaborative, user-centered design study with a team of scientific researchers to develop DeepSee, an interactive data workspace that visualizes 2D and 3D interpolations of biogeochemical and microbial processes in context together with sediment sampling history overlaid on 2D seafloor maps. Based on a field deployment and qualitative interviews, we found that DeepSee increased the scientific return from limited sample sizes, catalyzed new research workflows, reduced long-term costs of sharing data, and supported teamwork and communication between team members with diverse research goals. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: Accepted to CHI 2024. 16 pages, 7 figures, 2 tables. For a demo video, see https://youtu.be/HJ4zbueJ9cs . For a live demo, visit https://www.its.caltech.edu/~datavis/deepsee/ . The source code is available at https://github.com/orphanlab/DeepSee

arXiv:2402.07462 [pdf]

A Hormetic Approach to the Value-Loading Problem: Preventing the Paperclip Apocalypse?

Authors: Nathan I. N. Henry, Mangor Pedersen, Matt Williams, Jamin L. B. Martin, Liesje Donkin

Abstract: The value-loading problem is a significant challenge for researchers aiming to create artificial intelligence (AI) systems that align with human values and preferences. This problem requires a method to define and regulate safe and optimal limits of AI behaviors. In this work, we propose HALO (Hormetic ALignment via Opponent processes), a regulatory paradigm that uses hormetic analysis to regulate… ▽ More The value-loading problem is a significant challenge for researchers aiming to create artificial intelligence (AI) systems that align with human values and preferences. This problem requires a method to define and regulate safe and optimal limits of AI behaviors. In this work, we propose HALO (Hormetic ALignment via Opponent processes), a regulatory paradigm that uses hormetic analysis to regulate the behavioral patterns of AI. Behavioral hormesis is a phenomenon where low frequencies of a behavior have beneficial effects, while high frequencies are harmful. By modeling behaviors as allostatic opponent processes, we can use either Behavioral Frequency Response Analysis (BFRA) or Behavioral Count Response Analysis (BCRA) to quantify the hormetic limits of repeatable behaviors. We demonstrate how HALO can solve the 'paperclip maximizer' scenario, a thought experiment where an unregulated AI tasked with making paperclips could end up converting all matter in the universe into paperclips. Our approach may be used to help create an evolving database of 'values' based on the hedonic calculus of repeatable behaviors with decreasing marginal utility. This positions HALO as a promising solution for the value-loading problem, which involves embedding human-aligned values into an AI system, and the weak-to-strong generalization problem, which explores whether weak models can supervise stronger models as they become more intelligent. Hence, HALO opens several research avenues that may lead to the development of a computational value system that allows an AI algorithm to learn whether the decisions it makes are right or wrong. △ Less

Submitted 13 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: 24 pages, 7 figures

MSC Class: 68T01; 68T37; 68T42 ACM Class: I.2.0; I.2.8; I.2.11

arXiv:2401.03306 [pdf, other]

MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning

Authors: Rafael Rafailov, Kyle Hatch, Victor Kolev, John D. Martin, Mariano Phielipp, Chelsea Finn

Abstract: We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations in the context of realistic robot tasks. Recent offline model-free approaches successfully use online fine-tuning to either improve the performance of the agent over the data collection policy or adapt to novel tasks. At the same time, model-based RL algorithms have ach… ▽ More We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations in the context of realistic robot tasks. Recent offline model-free approaches successfully use online fine-tuning to either improve the performance of the agent over the data collection policy or adapt to novel tasks. At the same time, model-based RL algorithms have achieved significant progress in sample efficiency and the complexity of the tasks they can solve, yet remain under-utilized in the fine-tuning setting. In this work, we argue that existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains due to issues with distribution shifts, off-dynamics data, and non-stationary rewards. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization, while preventing model exploitation by controlling epistemic uncertainty. We find that our approach successfully solves tasks from the MetaWorld benchmark, as well as the Franka Kitchen robot manipulation environment completely from images. To the best of our knowledge, MOTO is the first method to solve this environment from pixels. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: This is an updated version of a manuscript that originally appeared at CoRL 2023. The project website is here https://sites.google.com/view/mo2o

Journal ref: Proceedings of The 7th Conference on Robot Learning, PMLR 229:3654-3671, 2023

arXiv:2312.10257 [pdf, other]

The Physics-Informed Neural Network Gravity Model: Generation III

Authors: John Martin, Hanspeter Schaub

Abstract: Scientific machine learning and the advent of the Physics-Informed Neural Network (PINN) have shown high potential in their ability to solve complex differential equations. One example is the use of PINNs to solve the gravity field modeling problem -- learning convenient representations of the gravitational potential from position and acceleration data. These PINN gravity models, or PINN-GMs, have… ▽ More Scientific machine learning and the advent of the Physics-Informed Neural Network (PINN) have shown high potential in their ability to solve complex differential equations. One example is the use of PINNs to solve the gravity field modeling problem -- learning convenient representations of the gravitational potential from position and acceleration data. These PINN gravity models, or PINN-GMs, have demonstrated advantages in model compactness, robustness to noise, and sample efficiency when compared to popular alternatives; however, further investigation has revealed various failure modes for these and other machine learning gravity models which this manuscript aims to address. Specifically, this paper introduces the third generation Physics-Informed Neural Network Gravity Model (PINN-GM-III) which includes design changes that solve the problems of feature divergence, bias towards low-altitude samples, numerical instability, and extrapolation error. Six evaluation metrics are proposed to expose these past pitfalls and illustrate the PINN-GM-III's robustness to them. This study concludes by evaluating the PINN-GM-III modeling accuracy on a heterogeneous density asteroid, and comparing its performance to other analytic and machine learning gravity models. △ Less

Submitted 13 August, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: 40 pages, 10 figures, submitted to The Journal of Astronautical Sciences

arXiv:2312.09394 [pdf, other]

doi 10.1109/ACCESS.2024.3427012

HiER: Highlight Experience Replay for Boosting Off-Policy Reinforcement Learning Agents

Authors: Dániel Horváth, Jesús Bujalance Martín, Ferenc Gábor Erdős, Zoltán Istenes, Fabien Moutarde

Abstract: Even though reinforcement-learning-based algorithms achieved superhuman performance in many domains, the field of robotics poses significant challenges as the state and action spaces are continuous, and the reward function is predominantly sparse. Furthermore, on many occasions, the agent is devoid of access to any form of demonstration. Inspired by human learning, in this work, we propose a metho… ▽ More Even though reinforcement-learning-based algorithms achieved superhuman performance in many domains, the field of robotics poses significant challenges as the state and action spaces are continuous, and the reward function is predominantly sparse. Furthermore, on many occasions, the agent is devoid of access to any form of demonstration. Inspired by human learning, in this work, we propose a method named highlight experience replay (HiER) that creates a secondary highlight replay buffer for the most relevant experiences. For the weights update, the transitions are sampled from both the standard and the highlight experience replay buffer. It can be applied with or without the techniques of hindsight experience replay (HER) and prioritized experience replay (PER). Our method significantly improves the performance of the state-of-the-art, validated on 8 tasks of three robotic benchmarks. Furthermore, to exploit the full potential of HiER, we propose HiER+ in which HiER is enhanced with an arbitrary data collection curriculum learning method. Our implementation, the qualitative results, and a video presentation are available on the project site: http://www.danielhorvath.eu/hier/. △ Less

Submitted 26 July, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Published in IEEE Access

arXiv:2311.10928 [pdf, other]

CAMRA: Copilot for AMR Annotation

Authors: Jon Z. Cai, Shafiuddin Rehan Ahmed, Julia Bonn, Kristin Wright-Bettner, Martha Palmer, James H. Martin

Abstract: In this paper, we introduce CAMRA (Copilot for AMR Annotatations), a cutting-edge web-based tool designed for constructing Abstract Meaning Representation (AMR) from natural language text. CAMRA offers a novel approach to deep lexical semantics annotation such as AMR, treating AMR annotation akin to coding in programming languages. Leveraging the familiarity of programming paradigms, CAMRA encompa… ▽ More In this paper, we introduce CAMRA (Copilot for AMR Annotatations), a cutting-edge web-based tool designed for constructing Abstract Meaning Representation (AMR) from natural language text. CAMRA offers a novel approach to deep lexical semantics annotation such as AMR, treating AMR annotation akin to coding in programming languages. Leveraging the familiarity of programming paradigms, CAMRA encompasses all essential features of existing AMR editors, including example lookup, while going a step further by integrating Propbank roleset lookup as an autocomplete feature within the tool. Notably, CAMRA incorporates AMR parser models as coding co-pilots, greatly enhancing the efficiency and accuracy of AMR annotators. To demonstrate the tool's capabilities, we provide a live demo accessible at: https://camra.colorado.edu △ Less

Submitted 20 February, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: EMNLP 2023 System Demonstration

arXiv:2311.01468 [pdf, other]

Remember what you did so you know what to do next

Authors: Manuel R. Ciosici, Alex Hedges, Yash Kankanampati, Justin Martin, Marjorie Freedman, Ralph Weischedel

Abstract: We explore using a moderately sized large language model (GPT-J 6B parameters) to create a plan for a simulated robot to achieve 30 classes of goals in ScienceWorld, a text game simulator for elementary science experiments. Previously published empirical work claimed that large language models (LLMs) are a poor fit (Wang et al., 2022) compared to reinforcement learning. Using the Markov assumption… ▽ More We explore using a moderately sized large language model (GPT-J 6B parameters) to create a plan for a simulated robot to achieve 30 classes of goals in ScienceWorld, a text game simulator for elementary science experiments. Previously published empirical work claimed that large language models (LLMs) are a poor fit (Wang et al., 2022) compared to reinforcement learning. Using the Markov assumption (a single previous step), the LLM outperforms the reinforcement learning-based approach by a factor of 1.4. When we fill the LLM's input buffer with as many prior steps as possible, improvement rises to 3.5x. Even when training on only 6.5% of the training data, we observe a 2.2x improvement over the reinforcement-learning-based approach. Our experiments show that performance varies widely across the 30 classes of actions, indicating that averaging over tasks can hide significant performance issues. In work contemporaneous with ours, Lin et al. (2023) demonstrated a two-part approach (SwiftSage) that uses a small LLM (T5-large) complemented by OpenAI's massive LLMs to achieve outstanding results in ScienceWorld. Our 6-B parameter, single-stage GPT-J matches the performance of SwiftSage's two-stage architecture when it incorporates GPT-3.5 turbo which has 29-times more parameters than GPT-J. △ Less

Submitted 30 October, 2023; originally announced November 2023.

Comments: Identical to EMNLP 2023 Findings

arXiv:2310.07084 [pdf, other]

Investigating the Adversarial Robustness of Density Estimation Using the Probability Flow ODE

Authors: Marius Arvinte, Cory Cornelius, Jason Martin, Nageen Himayat

Abstract: Beyond their impressive sampling capabilities, score-based diffusion models offer a powerful analysis tool in the form of unbiased density estimation of a query sample under the training data distribution. In this work, we investigate the robustness of density estimation using the probability flow (PF) neural ordinary differential equation (ODE) model against gradient-based likelihood maximization… ▽ More Beyond their impressive sampling capabilities, score-based diffusion models offer a powerful analysis tool in the form of unbiased density estimation of a query sample under the training data distribution. In this work, we investigate the robustness of density estimation using the probability flow (PF) neural ordinary differential equation (ODE) model against gradient-based likelihood maximization attacks and the relation to sample complexity, where the compressed size of a sample is used as a measure of its complexity. We introduce and evaluate six gradient-based log-likelihood maximization attacks, including a novel reverse integration attack. Our experimental evaluations on CIFAR-10 show that density estimation using the PF ODE is robust against high-complexity, high-likelihood attacks, and that in some cases adversarial samples are semantically meaningful, as expected from a robust estimator. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2309.16744 [pdf]

Predicting Long-term Renal Impairment in Post-COVID-19 Patients with Machine Learning Algorithms

Authors: Maitham G. Yousif, Hector J. Castro, John Martin, Hayder A. Albaqer, Fadhil G. Al-Amran, Habeeb W. Shubber, Salman Rawaf

Abstract: The COVID-19 pandemic has had far-reaching implications for global public health. As we continue to grapple with its consequences, it becomes increasingly clear that post-COVID-19 complications are a significant concern. Among these complications, renal impairment has garnered particular attention due to its potential long-term health impacts. This study, conducted with a cohort of 821 post-COVID-… ▽ More The COVID-19 pandemic has had far-reaching implications for global public health. As we continue to grapple with its consequences, it becomes increasingly clear that post-COVID-19 complications are a significant concern. Among these complications, renal impairment has garnered particular attention due to its potential long-term health impacts. This study, conducted with a cohort of 821 post-COVID-19 patients from diverse regions of Iraq across the years 2021, 2022, and 2023, endeavors to predict the risk of long-term renal impairment using advanced machine learning algorithms. Our findings have the potential to revolutionize post-COVID-19 patient care by enabling early identification and intervention for those at risk of renal impairment, ultimately improving clinical outcomes. This research encompasses comprehensive data collection and preprocessing, feature selection, and the development of predictive models using various machine learning algorithms. The study's objectives are to assess the incidence of long-term renal impairment in post-COVID-19 patients, identify associated risk factors, create predictive models, and evaluate their accuracy. We anticipate that our machine learning models, drawing from a rich dataset, will provide valuable insights into the risk of renal impairment, ultimately enhancing patient care and quality of life. In conclusion, the research presented herein offers a critical contribution to the field of post-COVID-19 care. By harnessing the power of machine learning, we aim to predict long-term renal impairment risk accurately. These predictions have the potential to inform healthcare professionals, enabling them to take proactive measures and provide targeted interventions for post-COVID-19 patients at risk of renal complications, thus minimizing the impact of this serious health concern. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.16066 [pdf, other]

Label Augmentation Method for Medical Landmark Detection in Hip Radiograph Images

Authors: Yehyun Suh, Peter Chan, J. Ryan Martin, Daniel Moyer

Abstract: This work reports the empirical performance of an automated medical landmark detection method for predict clinical markers in hip radiograph images. Notably, the detection method was trained using a label-only augmentation scheme; our results indicate that this form of augmentation outperforms traditional data augmentation and produces highly sample efficient estimators. We train a generic U-Net-b… ▽ More This work reports the empirical performance of an automated medical landmark detection method for predict clinical markers in hip radiograph images. Notably, the detection method was trained using a label-only augmentation scheme; our results indicate that this form of augmentation outperforms traditional data augmentation and produces highly sample efficient estimators. We train a generic U-Net-based architecture under a curriculum consisting of two phases: initially relaxing the landmarking task by enlarging the label points to regions, then gradually eroding these label regions back to the base task. We measure the benefits of this approach on six datasets of radiographs with gold-standard expert annotations. △ Less

Submitted 8 December, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.09993 [pdf]

Long-term Neurological Sequelae in Post-COVID-19 Patients: A Machine Learning Approach to Predict Outcomes

Authors: Hayder A. Albaqer, Kadhum J. Al-Jibouri, John Martin, Fadhil G. Al-Amran, Salman Rawaf, Maitham G. Yousif

Abstract: The COVID-19 pandemic has brought to light a concerning aspect of long-term neurological complications in post-recovery patients. This study delved into the investigation of such neurological sequelae in a cohort of 500 post-COVID-19 patients, encompassing individuals with varying illness severity. The primary aim was to predict outcomes using a machine learning approach based on diverse clinical… ▽ More The COVID-19 pandemic has brought to light a concerning aspect of long-term neurological complications in post-recovery patients. This study delved into the investigation of such neurological sequelae in a cohort of 500 post-COVID-19 patients, encompassing individuals with varying illness severity. The primary aim was to predict outcomes using a machine learning approach based on diverse clinical data and neuroimaging parameters. The results revealed that 68% of the post-COVID-19 patients reported experiencing neurological symptoms, with fatigue, headache, and anosmia being the most common manifestations. Moreover, 22% of the patients exhibited more severe neurological complications, including encephalopathy and stroke. The application of machine learning models showed promising results in predicting long-term neurological outcomes. Notably, the Random Forest model achieved an accuracy of 85%, sensitivity of 80%, and specificity of 90% in identifying patients at risk of developing neurological sequelae. These findings underscore the importance of continuous monitoring and follow-up care for post-COVID-19 patients, particularly in relation to potential neurological complications. The integration of machine learning-based outcome prediction offers a valuable tool for early intervention and personalized treatment strategies, aiming to improve patient care and clinical decision-making. In conclusion, this study sheds light on the prevalence of long-term neurological complications in post-COVID-19 patients and demonstrates the potential of machine learning in predicting outcomes, thereby contributing to enhanced patient management and better health outcomes. Further research and larger studies are warranted to validate and refine these predictive models and to gain deeper insights into the underlying mechanisms of post-COVID-19 neurological sequelae. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2308.16258 [pdf, other]

Robust Principles: Architectural Design Principles for Adversarially Robust CNNs

Authors: ShengYun Peng, Weilin Xu, Cory Cornelius, Matthew Hull, Kevin Li, Rahul Duggal, Mansi Phute, Jason Martin, Duen Horng Chau

Abstract: Our research aims to unify existing works' diverging opinions on how architectural components affect the adversarial robustness of CNNs. To accomplish our goal, we synthesize a suite of three generalizable robust architectural design principles: (a) optimal range for depth and width configurations, (b) preferring convolutional over patchify stem stage, and (c) robust residual block design through… ▽ More Our research aims to unify existing works' diverging opinions on how architectural components affect the adversarial robustness of CNNs. To accomplish our goal, we synthesize a suite of three generalizable robust architectural design principles: (a) optimal range for depth and width configurations, (b) preferring convolutional over patchify stem stage, and (c) robust residual block design through adopting squeeze and excitation blocks and non-parametric smooth activation functions. Through extensive experiments across a wide spectrum of dataset scales, adversarial training methods, model parameters, and network design spaces, our principles consistently and markedly improve AutoAttack accuracy: 1-3 percentage points (pp) on CIFAR-10 and CIFAR-100, and 4-9 pp on ImageNet. The code is publicly available at https://github.com/poloclub/robust-principles. △ Less

Submitted 31 August, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Published at BMVC'23

arXiv:2308.07540 [pdf, other]

doi 10.1609/aiide.v19i1.27534

CALYPSO: LLMs as Dungeon Masters' Assistants

Authors: Andrew Zhu, Lara J. Martin, Andrew Head, Chris Callison-Burch

Abstract: The role of a Dungeon Master, or DM, in the game Dungeons & Dragons is to perform multiple tasks simultaneously. The DM must digest information about the game setting and monsters, synthesize scenes to present to other players, and respond to the players' interactions with the scene. Doing all of these tasks while maintaining consistency within the narrative and story world is no small feat of hum… ▽ More The role of a Dungeon Master, or DM, in the game Dungeons & Dragons is to perform multiple tasks simultaneously. The DM must digest information about the game setting and monsters, synthesize scenes to present to other players, and respond to the players' interactions with the scene. Doing all of these tasks while maintaining consistency within the narrative and story world is no small feat of human cognition, making the task tiring and unapproachable to new players. Large language models (LLMs) like GPT-3 and ChatGPT have shown remarkable abilities to generate coherent natural language text. In this paper, we conduct a formative evaluation with DMs to establish the use cases of LLMs in D&D and tabletop gaming generally. We introduce CALYPSO, a system of LLM-powered interfaces that support DMs with information and inspiration specific to their own scenario. CALYPSO distills game context into bite-sized prose and helps brainstorm ideas without distracting the DM from the game. When given access to CALYPSO, DMs reported that it generated high-fidelity text suitable for direct presentation to players, and low-fidelity ideas that the DM could develop further while maintaining their creative agency. We see CALYPSO as exemplifying a paradigm of AI-augmented tools that provide synchronous creative assistance within established game worlds, and tabletop gaming more broadly. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: 11 pages, 4 figures. AIIDE 2023

Journal ref: AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE) 2023

arXiv:2307.14628 [pdf, other]

Rapid and Scalable Bayesian AB Testing

Authors: Srivas Chennu, Andrew Maher, Christian Pangerl, Subash Prabanantham, Jae Hyeon Bae, Jamie Martin, Bud Goswami

Abstract: AB testing aids business operators with their decision making, and is considered the gold standard method for learning from data to improve digital user experiences. However, there is usually a gap between the requirements of practitioners, and the constraints imposed by the statistical hypothesis testing methodologies commonly used for analysis of AB tests. These include the lack of statistical p… ▽ More AB testing aids business operators with their decision making, and is considered the gold standard method for learning from data to improve digital user experiences. However, there is usually a gap between the requirements of practitioners, and the constraints imposed by the statistical hypothesis testing methodologies commonly used for analysis of AB tests. These include the lack of statistical power in multivariate designs with many factors, correlations between these factors, the need of sequential testing for early stopping, and the inability to pool knowledge from past tests. Here, we propose a solution that applies hierarchical Bayesian estimation to address the above limitations. In comparison to current sequential AB testing methodology, we increase statistical power by exploiting correlations between factors, enabling sequential testing and progressive early stopping, without incurring excessive false positive risk. We also demonstrate how this methodology can be extended to enable the extraction of composite global learnings from past AB tests, to accelerate future tests. We underpin our work with a solid theoretical framework that articulates the value of hierarchical estimation. We demonstrate its utility using both numerical simulations and a large set of real-world AB tests. Together, these results highlight the practical value of our approach for statistical inference in the technology industry. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: The 10th IEEE International Conference On Data Science And Advanced Analytics

arXiv:2306.05434 [pdf, other]

How Good is the Model in Model-in-the-loop Event Coreference Resolution Annotation?

Authors: Shafiuddin Rehan Ahmed, Abhijnan Nath, Michael Regan, Adam Pollins, Nikhil Krishnaswamy, James H. Martin

Abstract: Annotating cross-document event coreference links is a time-consuming and cognitively demanding task that can compromise annotation quality and efficiency. To address this, we propose a model-in-the-loop annotation approach for event coreference resolution, where a machine learning model suggests likely corefering event pairs only. We evaluate the effectiveness of this approach by first simulating… ▽ More Annotating cross-document event coreference links is a time-consuming and cognitively demanding task that can compromise annotation quality and efficiency. To address this, we propose a model-in-the-loop annotation approach for event coreference resolution, where a machine learning model suggests likely corefering event pairs only. We evaluate the effectiveness of this approach by first simulating the annotation process and then, using a novel annotator-centric Recall-Annotation effort trade-off metric, we compare the results of various underlying models and datasets. We finally present a method for obtaining 97\% recall while substantially reducing the workload required by a fully manual annotation process. Code and data can be found at https://github.com/ahmeshaf/model_in_coref △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: The 17th Liguistics Annotation Workshop, 2023 (LAW-XVII) short paper. 10 pages, 6 figures, 1 table

arXiv:2305.05672 [pdf, other]

$2 * n$ is better than $n^2$: Decomposing Event Coreference Resolution into Two Tractable Problems

Authors: Shafiuddin Rehan Ahmed, Abhijnan Nath, James H. Martin, Nikhil Krishnaswamy

Abstract: Event Coreference Resolution (ECR) is the task of linking mentions of the same event either within or across documents. Most mention pairs are not coreferent, yet many that are coreferent can be identified through simple techniques such as lemma matching of the event triggers or the sentences in which they appear. Existing methods for training coreference systems sample from a largely skewed distr… ▽ More Event Coreference Resolution (ECR) is the task of linking mentions of the same event either within or across documents. Most mention pairs are not coreferent, yet many that are coreferent can be identified through simple techniques such as lemma matching of the event triggers or the sentences in which they appear. Existing methods for training coreference systems sample from a largely skewed distribution, making it difficult for the algorithm to learn coreference beyond surface matching. Additionally, these methods are intractable because of the quadratic operations needed. To address these challenges, we break the problem of ECR into two parts: a) a heuristic to efficiently filter out a large number of non-coreferent pairs, and b) a training approach on a balanced set of coreferent and non-coreferent mention pairs. By following this approach, we show that we get comparable results to the state of the art on two popular ECR datasets while significantly reducing compute requirements. We also analyze the mention pairs that are "hard" to accurately classify as coreferent or non-coreferent. Code at https://github.com/ahmeshaf/lemma_ce_coref △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: Findings of the Association of Computational Linguistics, ACL 2023. 13 pages, 7 figures, 6 tables

arXiv:2305.01528 [pdf, other]

doi 10.18653/v1/2023.acl-long.229

FIREBALL: A Dataset of Dungeons and Dragons Actual-Play with Structured Game State Information

Authors: Andrew Zhu, Karmanya Aggarwal, Alexander Feng, Lara J. Martin, Chris Callison-Burch

Abstract: Dungeons & Dragons (D&D) is a tabletop roleplaying game with complex natural language interactions between players and hidden state information. Recent work has shown that large language models (LLMs) that have access to state information can generate higher quality game turns than LLMs that use dialog history alone. However, previous work used game state information that was heuristically created… ▽ More Dungeons & Dragons (D&D) is a tabletop roleplaying game with complex natural language interactions between players and hidden state information. Recent work has shown that large language models (LLMs) that have access to state information can generate higher quality game turns than LLMs that use dialog history alone. However, previous work used game state information that was heuristically created and was not a true gold standard game state. We present FIREBALL, a large dataset containing nearly 25,000 unique sessions from real D&D gameplay on Discord with true game state info. We recorded game play sessions of players who used the Avrae bot, which was developed to aid people in playing D&D online, capturing language, game commands and underlying game state information. We demonstrate that FIREBALL can improve natural language generation (NLG) by using Avrae state information, improving both automated metrics and human judgments of quality. Additionally, we show that LLMs can generate executable Avrae commands, particularly after finetuning. △ Less

Submitted 25 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

Comments: 21 pages, 2 figures. Accepted at ACL 2023

Journal ref: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 4171-4193

arXiv:2304.09996 [pdf, other]

Robust Route Planning with Distributional Reinforcement Learning in a Stochastic Road Network Environment

Authors: Xi Lin, Paul Szenher, John D. Martin, Brendan Englot

Abstract: Route planning is essential to mobile robot navigation problems. In recent years, deep reinforcement learning (DRL) has been applied to learning optimal planning policies in stochastic environments without prior knowledge. However, existing works focus on learning policies that maximize the expected return, the performance of which can vary greatly when the level of stochasticity in the environmen… ▽ More Route planning is essential to mobile robot navigation problems. In recent years, deep reinforcement learning (DRL) has been applied to learning optimal planning policies in stochastic environments without prior knowledge. However, existing works focus on learning policies that maximize the expected return, the performance of which can vary greatly when the level of stochasticity in the environment is high. In this work, we propose a distributional reinforcement learning based framework that learns return distributions which explicitly reflect environmental stochasticity. Policies based on the second-order stochastic dominance (SSD) relation can be used to make adjustable route decisions according to user preference on performance robustness. Our proposed method is evaluated in a simulated road network environment, and experimental results show that our method is able to plan the shortest routes that minimize stochasticity in travel time when robustness is preferred, while other state-of-the-art DRL methods are agnostic to environmental stochasticity. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: The 20th International Conference on Ubiquitous Robots (UR 2023)

arXiv:2302.13048 [pdf, other]

doi 10.18653/v1/2023.acl-demo.1

Human-in-the-Loop Schema Induction

Authors: Tianyi Zhang, Isaac Tham, Zhaoyi Hou, Jiaxuan Ren, Liyang Zhou, Hainiu Xu, Li Zhang, Lara J. Martin, Rotem Dror, Sha Li, Heng Ji, Martha Palmer, Susan Brown, Reece Suchocki, Chris Callison-Burch

Abstract: Schema induction builds a graph representation explaining how events unfold in a scenario. Existing approaches have been based on information retrieval (IR) and information extraction(IE), often with limited human curation. We demonstrate a human-in-the-loop schema induction system powered by GPT-3. We first describe the different modules of our system, including prompting to generate schematic el… ▽ More Schema induction builds a graph representation explaining how events unfold in a scenario. Existing approaches have been based on information retrieval (IR) and information extraction(IE), often with limited human curation. We demonstrate a human-in-the-loop schema induction system powered by GPT-3. We first describe the different modules of our system, including prompting to generate schematic elements, manual edit of those elements, and conversion of those into a schema graph. By qualitatively comparing our system to previous ones, we show that our system not only transfers to new domains more easily than previous approaches, but also reduces efforts of human curation thanks to our interactive interface. △ Less

Submitted 25 February, 2023; originally announced February 2023.

Comments: 10 pages, ACL2023 demo track

arXiv:2302.12944 [pdf, other]

Dependency Dialogue Acts -- Annotation Scheme and Case Study

Authors: Jon Z. Cai, Brendan King, Margaret Perkoff, Shiran Dudy, Jie Cao, Marie Grace, Natalia Wojarnik, Ananya Ganesh, James H. Martin, Martha Palmer, Marilyn Walker, Jeffrey Flanigan

Abstract: In this paper, we introduce Dependency Dialogue Acts (DDA), a novel framework for capturing the structure of speaker-intentions in multi-party dialogues. DDA combines and adapts features from existing dialogue annotation frameworks, and emphasizes the multi-relational response structure of dialogues in addition to the dialogue acts and rhetorical relations. It represents the functional, discourse,… ▽ More In this paper, we introduce Dependency Dialogue Acts (DDA), a novel framework for capturing the structure of speaker-intentions in multi-party dialogues. DDA combines and adapts features from existing dialogue annotation frameworks, and emphasizes the multi-relational response structure of dialogues in addition to the dialogue acts and rhetorical relations. It represents the functional, discourse, and response structure in multi-party multi-threaded conversations. A few key features distinguish DDA from existing dialogue annotation frameworks such as SWBD-DAMSL and the ISO 24617-2 standard. First, DDA prioritizes the relational structure of the dialogue units and the dialog context, annotating both dialog acts and rhetorical relations as response relations to particular utterances. Second, DDA embraces overloading in dialogues, encouraging annotators to specify multiple response relations and dialog acts for each dialog unit. Lastly, DDA places an emphasis on adequately capturing how a speaker is using the full dialog context to plan and organize their speech. With these features, DDA is highly expressive and recall-oriented with regard to conversation dynamics between multiple speakers. In what follows, we present the DDA annotation framework and case studies annotating DDA structures in multi-party, multi-threaded conversations. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: The 13th International Workshop on Spoken Dialogue Systems Technology

Journal ref: The 13th International Workshop on Spoken Dialogue Systems Technology 2023

arXiv:2302.01337 [pdf]

doi 10.1109/CSCI58124.2022.00289

Comprehensive and user-analytics-friendly cancer patient database for physicians and researchers

Authors: Ali Firooz, Avery T. Funkhouser, Julie C. Martin, W. Jeffery Edenfield, Homayoun Valafar, Anna V. Blenda

Abstract: Nuanced cancer patient care is needed, as the development and clinical course of cancer is multifactorial with influences from the general health status of the patient, germline and neoplastic mutations, co-morbidities, and environment. To effectively tailor an individualized treatment to each patient, such multifactorial data must be presented to providers in an easy-to-access and easy-to-analyze… ▽ More Nuanced cancer patient care is needed, as the development and clinical course of cancer is multifactorial with influences from the general health status of the patient, germline and neoplastic mutations, co-morbidities, and environment. To effectively tailor an individualized treatment to each patient, such multifactorial data must be presented to providers in an easy-to-access and easy-to-analyze fashion. To address the need, a relational database has been developed integrating status of cancer-critical gene mutations, serum galectin profiles, serum and tumor glycomic profiles, with clinical, demographic, and lifestyle data points of individual cancer patients. The database, as a backend, provides physicians and researchers with a single, easily accessible repository of cancer profiling data to aid-in and enhance individualized treatment. Our interactive database allows care providers to amalgamate cohorts from these groups to find correlations between different data types with the possibility of finding "molecular signatures" based upon a combination of genetic mutations, galectin serum levels, glycan compositions, and patient clinical data and lifestyle choices. Our project provides a framework for an integrated, interactive, and growing database to analyze molecular and clinical patterns across cancer stages and subtypes and provides opportunities for increased diagnostic and prognostic power. △ Less

Submitted 1 February, 2023; originally announced February 2023.

Comments: 7 pages, 12 figures, peer reviewed and accepted in "International Conference on Computational Science and Computational Intelligence (CSCI 22)"

Journal ref: Proceedings of the 2022 International Conference on Computational Science and Computational Intelligence (CSCI)

arXiv:2301.08104 [pdf, other]

doi 10.1609/icwsm.v17i1.22141

Author as Character and Narrator: Deconstructing Personal Narratives from the r/AmITheAsshole Reddit Community

Authors: Salvatore Giorgi, Ke Zhao, Alexander H. Feng, Lara J. Martin

Abstract: In the r/AmITheAsshole subreddit, people anonymously share first person narratives that contain some moral dilemma or conflict and ask the community to judge who is at fault (i.e., who is "the asshole"). In general, first person narratives are a unique storytelling domain where the author is the narrator (the person telling the story) but can also be a character (the person living the story) and,… ▽ More In the r/AmITheAsshole subreddit, people anonymously share first person narratives that contain some moral dilemma or conflict and ask the community to judge who is at fault (i.e., who is "the asshole"). In general, first person narratives are a unique storytelling domain where the author is the narrator (the person telling the story) but can also be a character (the person living the story) and, thus, the author has two distinct voices presented in the story. In this study, we identify linguistic and narrative features associated with the author as the character or as a narrator. We use these features to answer the following questions: (1) what makes an asshole character and (2) what makes an asshole narrator? We extract both Author-as-Character features (e.g., demographics, narrative event chain, and emotional arc) and Author-as-Narrator features (i.e., the style and emotion of the story as a whole) in order to identify which aspects of the narrative are correlated with the final moral judgment. Our work shows that "assholes" as Characters frame themselves as lacking agency with a more positive personal arc, while "assholes" as Narrators will tell emotional and opinionated stories. △ Less

Submitted 15 March, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

Comments: Accepted to the 17th International AAAI Conference on Web and Social Media (ICWSM), 2023

Journal ref: Proceedings of the International AAAI Conference on Web and Social Media (ICWSM) 2023, 17(1), 233-244

arXiv:2301.03110 [pdf, other]

RobArch: Designing Robust Architectures against Adversarial Attacks

Authors: ShengYun Peng, Weilin Xu, Cory Cornelius, Kevin Li, Rahul Duggal, Duen Horng Chau, Jason Martin

Abstract: Adversarial Training is the most effective approach for improving the robustness of Deep Neural Networks (DNNs). However, compared to the large body of research in optimizing the adversarial training process, there are few investigations into how architecture components affect robustness, and they rarely constrain model capacity. Thus, it is unclear where robustness precisely comes from. In this w… ▽ More Adversarial Training is the most effective approach for improving the robustness of Deep Neural Networks (DNNs). However, compared to the large body of research in optimizing the adversarial training process, there are few investigations into how architecture components affect robustness, and they rarely constrain model capacity. Thus, it is unclear where robustness precisely comes from. In this work, we present the first large-scale systematic study on the robustness of DNN architecture components under fixed parameter budgets. Through our investigation, we distill 18 actionable robust network design guidelines that empower model developers to gain deep insights. We demonstrate these guidelines' effectiveness by introducing the novel Robust Architecture (RobArch) model that instantiates the guidelines to build a family of top-performing models across parameter capacities against strong adversarial attacks. RobArch achieves the new state-of-the-art AutoAttack accuracy on the RobustBench ImageNet leaderboard. The code is available at $\href{https://github.com/ShengYun-Peng/RobArch}{\text{this url}}$. △ Less

Submitted 8 January, 2023; originally announced January 2023.

arXiv:2301.00280 [pdf]

RECOMED: A Comprehensive Pharmaceutical Recommendation System

Authors: Mariam Zomorodi, Ismail Ghodsollahee, Jennifer H. Martin, Nicholas J. Talley, Vahid Salari, Pawel Plawiak, Kazem Rahimi, U. Rajendra Acharya

Abstract: A comprehensive pharmaceutical recommendation system was designed based on the patients and drugs features extracted from Drugs.com and Druglib.com. First, data from these databases were combined, and a dataset of patients and drug information was built. Secondly, the patients and drugs were clustered, and then the recommendation was performed using different ratings provided by patients, and impo… ▽ More A comprehensive pharmaceutical recommendation system was designed based on the patients and drugs features extracted from Drugs.com and Druglib.com. First, data from these databases were combined, and a dataset of patients and drug information was built. Secondly, the patients and drugs were clustered, and then the recommendation was performed using different ratings provided by patients, and importantly by the knowledge obtained from patients and drug specifications, and considering drug interactions. To the best of our knowledge, we are the first group to consider patients conditions and history in the proposed approach for selecting a specific medicine appropriate for that particular user. Our approach applies artificial intelligence (AI) models for the implementation. Sentiment analysis using natural language processing approaches is employed in pre-processing along with neural network-based methods and recommender system algorithms for modeling the system. In our work, patients conditions and drugs features are used for making two models based on matrix factorization. Then we used drug interaction to filter drugs with severe or mild interactions with other drugs. We developed a deep learning model for recommending drugs by using data from 2304 patients as a training set, and then we used data from 660 patients as our validation set. After that, we used knowledge from critical information about drugs and combined the outcome of the model into a knowledge-based system with the rules obtained from constraints on taking medicine. △ Less

Submitted 21 August, 2023; v1 submitted 31 December, 2022; originally announced January 2023.

Comments: 39 pages, 14 figures, 13 tables

arXiv:2212.10754 [pdf, other]

doi 10.18653/v1/2023.findings-acl.832

CoRRPUS: Code-based Structured Prompting for Neurosymbolic Story Understanding

Authors: Yijiang River Dong, Lara J. Martin, Chris Callison-Burch

Abstract: Story generation and understanding -- as with all NLG/NLU tasks -- has seen a surge in neurosymbolic work. Researchers have recognized that, while large language models (LLMs) have tremendous utility, they can be augmented with symbolic means to be even better and to make up for any flaws that the neural networks might have. However, symbolic methods are extremely costly in terms of the amount of… ▽ More Story generation and understanding -- as with all NLG/NLU tasks -- has seen a surge in neurosymbolic work. Researchers have recognized that, while large language models (LLMs) have tremendous utility, they can be augmented with symbolic means to be even better and to make up for any flaws that the neural networks might have. However, symbolic methods are extremely costly in terms of the amount of time and expertise needed to create them. In this work, we capitalize on state-of-the-art Code-LLMs, such as Codex, to bootstrap the use of symbolic methods for tracking the state of stories and aiding in story understanding. We show that our CoRRPUS system and abstracted prompting procedures can beat current state-of-the-art structured LLM techniques on pre-existing story understanding tasks (bAbI Task 2 and Re^3) with minimal hand engineering. We hope that this work can help highlight the importance of symbolic representations and specialized prompting for LLMs as these models require some guidance for performing reasoning tasks properly. △ Less

Submitted 8 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: Accepted to Findings of ACL 2023

Journal ref: Findings of ACL 2023, pp. 13152-13168

arXiv:2212.10420 [pdf, other]

Settling the Reward Hypothesis

Authors: Michael Bowling, John D. Martin, David Abel, Will Dabney

Abstract: The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)." We aim to fully settle this hypothesis. This will not conclude with a simple affirmation or refutation, but rather specify completely the implicit requirements on goals and purposes under which the hy… ▽ More The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)." We aim to fully settle this hypothesis. This will not conclude with a simple affirmation or refutation, but rather specify completely the implicit requirements on goals and purposes under which the hypothesis holds. △ Less

Submitted 16 September, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

arXiv:2212.10283 [pdf, other]

Interpretable models for extrapolation in scientific machine learning

Authors: Eric S. Muckley, James E. Saal, Bryce Meredig, Christopher S. Roper, John H. Martin

Abstract: Data-driven models are central to scientific discovery. In efforts to achieve state-of-the-art model accuracy, researchers are employing increasingly complex machine learning algorithms that often outperform simple regressions in interpolative settings (e.g. random k-fold cross-validation) but suffer from poor extrapolation performance, portability, and human interpretability, which limits their p… ▽ More Data-driven models are central to scientific discovery. In efforts to achieve state-of-the-art model accuracy, researchers are employing increasingly complex machine learning algorithms that often outperform simple regressions in interpolative settings (e.g. random k-fold cross-validation) but suffer from poor extrapolation performance, portability, and human interpretability, which limits their potential for facilitating novel scientific insight. Here we examine the trade-off between model performance and interpretability across a broad range of science and engineering problems with an emphasis on materials science datasets. We compare the performance of black box random forest and neural network machine learning algorithms to that of single-feature linear regressions which are fitted using interpretable input features discovered by a simple random search algorithm. For interpolation problems, the average prediction errors of linear regressions were twice as high as those of black box models. Remarkably, when prediction tasks required extrapolation, linear models yielded average error only 5% higher than that of black box models, and outperformed black box models in roughly 40% of the tested prediction tasks, which suggests that they may be desirable over complex algorithms in many extrapolation problems because of their superior interpretability, computational overhead, and ease of use. The results challenge the common assumption that extrapolative models for scientific machine learning are constrained by an inherent trade-off between performance and interpretability. △ Less

Submitted 16 December, 2022; originally announced December 2022.

Comments: DISTRIBUTION STATEMENT A (Approved for Public Release, Distribution Unlimited)

arXiv:2212.09821 [pdf, other]

Reduced Order Model of a Generic Submarine for Maneuvering Near the Surface

Authors: J. Ezequiel Martin, Maxwell Hammond, Nicholas Rober, Yakin Kim, Venanzio Cichella, Pablo Carrica

Abstract: A reduced order model of a generic submarine is presented. Computational fluid dynamics (CFD) results are used to create and validate a model that includes depth dependence and the effect of waves on the craft. The model and the procedure to obtain its coefficients are discussed, and examples of the data used to obtain the model coefficients are presented. An example of operation following a compl… ▽ More A reduced order model of a generic submarine is presented. Computational fluid dynamics (CFD) results are used to create and validate a model that includes depth dependence and the effect of waves on the craft. The model and the procedure to obtain its coefficients are discussed, and examples of the data used to obtain the model coefficients are presented. An example of operation following a complex path is presented and results from the reduced order model are compared to those from an equivalent CFD calculation. The controller implemented to complete these maneuvers is also presented. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: Presented at the 34th Symposium on Naval Hydrodynamics, Washington DC, USA, 26 June - 1 July 2022

arXiv:2210.07109 [pdf, other]

doi 10.18653/v1/2022.emnlp-main.637

Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence

Authors: Chris Callison-Burch, Gaurav Singh Tomar, Lara J. Martin, Daphne Ippolito, Suma Bailis, David Reitter

Abstract: AI researchers have posited Dungeons and Dragons (D&D) as a challenge problem to test systems on various language-related capabilities. In this paper, we frame D&D specifically as a dialogue system challenge, where the tasks are to both generate the next conversational turn in the game and predict the state of the game given the dialogue history. We create a gameplay dataset consisting of nearly 9… ▽ More AI researchers have posited Dungeons and Dragons (D&D) as a challenge problem to test systems on various language-related capabilities. In this paper, we frame D&D specifically as a dialogue system challenge, where the tasks are to both generate the next conversational turn in the game and predict the state of the game given the dialogue history. We create a gameplay dataset consisting of nearly 900 games, with a total of 7,000 players, 800,000 dialogue turns, 500,000 dice rolls, and 58 million words. We automatically annotate the data with partial state information about the game play. We train a large language model (LM) to generate the next game turn, conditioning it on different information. The LM can respond as a particular character or as the player who runs the game--i.e., the Dungeon Master (DM). It is trained to produce dialogue that is either in-character (roleplaying in the fictional world) or out-of-character (discussing rules or strategy). We perform a human evaluation to determine what factors make the generated output plausible and interesting. We further perform an automatic evaluation to determine how well the model can predict the game state given the history and examine how well tracking the game state improves its ability to produce plausible conversational output. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: Accepted at EMNLP 2022

Journal ref: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 9379-9393, Dec. 2022

arXiv:2209.14030 [pdf, other]

doi 10.4204/EPTCS.371.15

Monitoring ROS2: from Requirements to Autonomous Robots

Authors: Ivan Perez, Anastasia Mavridou, Tom Pressburger, Alexander Will, Patrick J. Martin

Abstract: Runtime verification (RV) has the potential to enable the safe operation of safety-critical systems that are too complex to formally verify, such as Robot Operating System 2 (ROS2) applications. Writing correct monitors can itself be complex, and errors in the monitoring subsystem threaten the mission as a whole. This paper provides an overview of a formal approach to generating runtime monitors f… ▽ More Runtime verification (RV) has the potential to enable the safe operation of safety-critical systems that are too complex to formally verify, such as Robot Operating System 2 (ROS2) applications. Writing correct monitors can itself be complex, and errors in the monitoring subsystem threaten the mission as a whole. This paper provides an overview of a formal approach to generating runtime monitors for autonomous robots from requirements written in a structured natural language. Our approach integrates the Formal Requirement Elicitation Tool (FRET) with Copilot, a runtime verification framework, through the Ogma integration tool. FRET is used to specify requirements with unambiguous semantics, which are then automatically translated into temporal logic formulae. Ogma generates monitor specifications from the FRET output, which are compiled into hard-real time C99. To facilitate integration of the monitors in ROS2, we have extended Ogma to generate ROS2 packages defining monitoring nodes, which run the monitors when new data becomes available, and publish the results of any violations. The goal of our approach is to treat the generated ROS2 packages as black boxes and integrate them into larger ROS2 systems with minimal effort. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: In Proceedings FMAS2022 ASYDE2022, arXiv:2209.13181

ACM Class: D.2.1; D.2.4; I.2.9;

Journal ref: EPTCS 371, 2022, pp. 208-216

arXiv:2208.03609 [pdf, other]

Continual Learning for Tumor Classification in Histopathology Images

Authors: Veena Kaustaban, Qinle Ba, Ipshita Bhattacharya, Nahil Sobh, Satarupa Mukherjee, Jim Martin, Mohammad Saleh Miri, Christoph Guetter, Amal Chaturvedi

Abstract: Recent years have seen great advancements in the development of deep learning models for histopathology image analysis in digital pathology applications, evidenced by the increasingly common deployment of these models in both research and clinical settings. Although such models have shown unprecedented performance in solving fundamental computational tasks in DP applications, they suffer from cata… ▽ More Recent years have seen great advancements in the development of deep learning models for histopathology image analysis in digital pathology applications, evidenced by the increasingly common deployment of these models in both research and clinical settings. Although such models have shown unprecedented performance in solving fundamental computational tasks in DP applications, they suffer from catastrophic forgetting when adapted to unseen data with transfer learning. With an increasing need for deep learning models to handle ever changing data distributions, including evolving patient population and new diagnosis assays, continual learning models that alleviate model forgetting need to be introduced in DP based analysis. However, to our best knowledge, there is no systematic study of such models for DP-specific applications. Here, we propose CL scenarios in DP settings, where histopathology image data from different sources/distributions arrive sequentially, the knowledge of which is integrated into a single model without training all the data from scratch. We then established an augmented dataset for colorectal cancer H&E classification to simulate shifts of image appearance and evaluated CL model performance in the proposed CL scenarios. We leveraged a breast tumor H&E dataset along with the colorectal cancer to evaluate CL from different tumor types. In addition, we evaluated CL methods in an online few-shot setting under the constraints of annotation and computational resources. We revealed promising results of CL in DP applications, potentially paving the way for application of these methods in clinical practice. △ Less

Submitted 6 August, 2022; originally announced August 2022.

Comments: Accepted by MOVI, a MICCAI2022 workshop: https://sites.google.com/view/movi2022

arXiv:2207.10719 [pdf, other]

Synthetic Dataset Generation for Adversarial Machine Learning Research

Authors: Xiruo Liu, Shibani Singh, Cory Cornelius, Colin Busho, Mike Tan, Anindya Paul, Jason Martin

Abstract: Existing adversarial example research focuses on digitally inserted perturbations on top of existing natural image datasets. This construction of adversarial examples is not realistic because it may be difficult, or even impossible, for an attacker to deploy such an attack in the real-world due to sensing and environmental effects. To better understand adversarial examples against cyber-physical s… ▽ More Existing adversarial example research focuses on digitally inserted perturbations on top of existing natural image datasets. This construction of adversarial examples is not realistic because it may be difficult, or even impossible, for an attacker to deploy such an attack in the real-world due to sensing and environmental effects. To better understand adversarial examples against cyber-physical systems, we propose approximating the real-world through simulation. In this paper we describe our synthetic dataset generation tool that enables scalable collection of such a synthetic dataset with realistic adversarial examples. We use the CARLA simulator to collect such a dataset and demonstrate simulated attacks that undergo the same environmental transforms and processing as real-world images. Our tools have been used to collect datasets to help evaluate the efficacy of adversarial examples, and can be found at https://github.com/carla-simulator/carla/pull/4992. △ Less

Submitted 21 July, 2022; originally announced July 2022.

Journal ref: AdvML Frontiers 2022

arXiv:2206.13960 [pdf, ps, other]

Dynamic Memory for Interpretable Sequential Optimisation

Authors: Srivas Chennu, Andrew Maher, Jamie Martin, Subash Prabanantham

Abstract: Real-world applications of reinforcement learning for recommendation and experimentation faces a practical challenge: the relative reward of different bandit arms can evolve over the lifetime of the learning agent. To deal with these non-stationary cases, the agent must forget some historical knowledge, as it may no longer be relevant to minimise regret. We present a solution to handling non-stati… ▽ More Real-world applications of reinforcement learning for recommendation and experimentation faces a practical challenge: the relative reward of different bandit arms can evolve over the lifetime of the learning agent. To deal with these non-stationary cases, the agent must forget some historical knowledge, as it may no longer be relevant to minimise regret. We present a solution to handling non-stationarity that is suitable for deployment at scale, to provide business operators with automated adaptive optimisation. Our solution aims to provide interpretable learning that can be trusted by humans, whilst responding to non-stationarity to minimise regret. To this end, we develop an adaptive Bayesian learning agent that employs a novel form of dynamic memory. It enables interpretability through statistical hypothesis testing, by targeting a set point of statistical power when comparing rewards and adjusting its memory dynamically to achieve this power. By design, the agent is agnostic to different kinds of non-stationarity. Using numerical simulations, we compare its performance against an existing proposal and show that, under multiple non-stationary scenarios, our agent correctly adapts to real changes in the true rewards. In all bandit solutions, there is an explicit trade-off between learning and achieving maximal performance. Our solution sits on a different point on this trade-off when compared to another similarly robust approach: we prioritise interpretability, which relies on more learning, at the cost of some regret. We describe the architecture of a large-scale deployment of automatic optimisation-as-a-service where our agent achieves interpretability whilst adapting to changing circumstances. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: 2nd International Workshop on Online and Adaptive Recommender Systems, 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, Washington DC

arXiv:2206.02848 [pdf]

Plagiarism deterrence for introductory programming

Authors: Simon J. Cohen, Michael J. Martin, Chance A. Shipley, Abhishek Kumar, Andrew R. Cohen

Abstract: Plagiarism in introductory programming courses is an enormous challenge for both students and institutions. For students, relying on the work of others too early in their academic development can make it impossible to acquire necessary skills for independent success in the future. For institutions, widespread student cheating can dilute the quality of the educational experience being offered. Curr… ▽ More Plagiarism in introductory programming courses is an enormous challenge for both students and institutions. For students, relying on the work of others too early in their academic development can make it impossible to acquire necessary skills for independent success in the future. For institutions, widespread student cheating can dilute the quality of the educational experience being offered. Currently available solutions consider only pairwise comparisons between student submissions and focus on punitive deterrence. Our approach instead relies on a class-wide statistical characterization that can be clearly and securely shared with students via an intuitive new p-value representing independence of student effort. A pairwise, compression-based similarity detection algorithm captures relationships between assignments more accurately. An automated deterrence system is used to warn students that their behavior is being closely monitored. High-confidence instances are made directly available for instructor review using our open-source toolkit. An unbiased scoring system aids students and the instructor in understanding true independence of effort. Preliminary results indicate that the system can provide meaningful measurements of independence from week one, improving the efficacy of technical education. △ Less

Submitted 6 June, 2022; originally announced June 2022.

arXiv:2205.10736 [pdf, other]

Should Models Be Accurate?

Authors: Esra'a Saleh, John D. Martin, Anna Koop, Arash Pourzarabi, Michael Bowling

Abstract: Model-based Reinforcement Learning (MBRL) holds promise for data-efficiency by planning with model-generated experience in addition to learning with experience from the environment. However, in complex or changing environments, models in MBRL will inevitably be imperfect, and their detrimental effects on learning can be difficult to mitigate. In this work, we question whether the objective of thes… ▽ More Model-based Reinforcement Learning (MBRL) holds promise for data-efficiency by planning with model-generated experience in addition to learning with experience from the environment. However, in complex or changing environments, models in MBRL will inevitably be imperfect, and their detrimental effects on learning can be difficult to mitigate. In this work, we question whether the objective of these models should be the accurate simulation of environment dynamics at all. We focus our investigations on Dyna-style planning in a prediction setting. First, we highlight and support three motivating points: a perfectly accurate model of environment dynamics is not practically achievable, is not necessary, and is not always the most useful anyways. Second, we introduce a meta-learning algorithm for training models with a focus on their usefulness to the learner instead of their accuracy in modelling the environment. Our experiments show that in a simple non-stationary environment, our algorithm enables faster learning than even using an accurate model built with domain-specific knowledge of the non-stationarity. △ Less

Submitted 22 May, 2022; originally announced May 2022.

Comments: The 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making ( RLDM 2022 )

arXiv:2205.01254 [pdf, other]

Deep API Learning Revisited

Authors: James Martin, Jin L. C. Guo

Abstract: Understanding the correct API usage sequences is one of the most important tasks for programmers when they work with unfamiliar libraries. However, programmers often encounter obstacles to finding the appropriate information due to either poor quality of API documentation or ineffective query-based searching strategy. To help solve this issue, researchers have proposed various methods to suggest t… ▽ More Understanding the correct API usage sequences is one of the most important tasks for programmers when they work with unfamiliar libraries. However, programmers often encounter obstacles to finding the appropriate information due to either poor quality of API documentation or ineffective query-based searching strategy. To help solve this issue, researchers have proposed various methods to suggest the sequence of APIs given natural language queries representing the information needs from programmers. Among such efforts, Gu et al. adopted a deep learning method, in particular an RNN Encoder-Decoder architecture, to perform this task and obtained promising results on common APIs in Java. In this work, we aim to reproduce their results and apply the same methods for APIs in Python. Additionally, we compare the performance with a more recent Transformer-based method, i.e., CodeBERT, for the same task. Our experiment reveals a clear drop in performance measures when careful data cleaning is performed. Owing to the pretraining from a large number of source code files and effective encoding technique, CodeBERT outperforms the method by Gu et al., to a large extent. △ Less

Submitted 2 May, 2022; originally announced May 2022.

Comments: 10 pages, 6 figures. This paper is accepted at ICPC 2022 (the 30th IEEE/ACM International Conference on Program Comprehension)

ACM Class: D.2.13

Showing 1–50 of 134 results for author: Martin, J