Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 118 results for author: Thomas, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16241  [pdf, other

    cs.LG stat.ME

    Position: Benchmarking is Limited in Reinforcement Learning Research

    Authors: Scott M. Jordan, Adam White, Bruno Castro da Silva, Martha White, Philip S. Thomas

    Abstract: Novel reinforcement learning algorithms, or improvements on existing ones, are commonly justified by evaluating their performance on benchmark environments and are compared to an ever-changing set of standard algorithms. However, despite numerous calls for improvements, experimental practices continue to produce misleading or unsupported claims. One reason for the ongoing substandard practices is… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 19 pages, 13 figures, The Forty-first International Conference on Machine Learning (ICML 2024)

  2. arXiv:2406.10082  [pdf, other

    eess.AS cs.CV cs.SD

    Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

    Authors: Andrew Rouditchenko, Yuan Gong, Samuel Thomas, Leonid Karlinsky, Hilde Kuehne, Rogerio Feris, James Glass

    Abstract: Audio-Visual Speech Recognition (AVSR) uses lip-based video to improve performance in noise. Since videos are harder to obtain than audio, the video training data of AVSR models is usually limited to a few thousand hours. In contrast, speech models such as Whisper are trained with hundreds of thousands of hours of data, and thus learn a better speech-to-text decoder. The huge training data differe… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024. Code https://github.com/roudimit/whisper-flamingo

  3. arXiv:2406.05646  [pdf, other

    cs.LG

    ICU-Sepsis: A Benchmark MDP Built from Real Medical Data

    Authors: Kartik Choudhary, Dhawal Gupta, Philip S. Thomas

    Abstract: We present ICU-Sepsis, an environment that can be used in benchmarks for evaluating reinforcement learning (RL) algorithms. Sepsis management is a complex task that has been an important topic in applied RL research in recent years. Therefore, MDPs that model sepsis management can serve as part of a benchmark to evaluate RL algorithms on a challenging real-world problem. However, creating usable M… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Reinforcement Learning Conference 2024

  4. arXiv:2403.10652  [pdf, other

    cs.LG q-fin.RM

    Improving Fairness in Credit Lending Models using Subgroup Threshold Optimization

    Authors: Cecilia Ying, Stephen Thomas

    Abstract: In an effort to improve the accuracy of credit lending decisions, many financial intuitions are now using predictions from machine learning models. While such predictions enjoy many advantages, recent research has shown that the predictions have the potential to be biased and unfair towards certain subgroups of the population. To combat this, several techniques have been introduced to help remove… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Neural Information Processing Systems (NeurIPS) Workshop in Strategic ML

  5. arXiv:2402.19450  [pdf, other

    cs.AI cs.CL

    Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap

    Authors: Saurabh Srivastava, Annarose M B, Anto P V, Shashank Menon, Ajay Sukumar, Adwaith Samod T, Alan Philipose, Stevin Prince, Sooraj Thomas

    Abstract: We propose a framework for robust evaluation of reasoning capabilities of language models, using functional variants of benchmarks. Models that solve a reasoning test should exhibit no difference in performance over the static version of a problem compared to a snapshot of the functional variant. We have rewritten the relevant fragment of the MATH benchmark into its functional variant MATH(), with… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 37 pages, 10 figures

  6. arXiv:2402.19062  [pdf, other

    eess.IV cs.CV cs.LG

    Graph Convolutional Neural Networks for Automated Echocardiography View Recognition: A Holistic Approach

    Authors: Sarina Thomas, Cristiana Tiago, Børge Solli Andreassen, Svein Arne Aase, Jurica Šprem, Erik Steen, Anne Solberg, Guy Ben-Yosef

    Abstract: To facilitate diagnosis on cardiac ultrasound (US), clinical practice has established several standard views of the heart, which serve as reference points for diagnostic measurements and define viewports from which images are acquired. Automatic view recognition involves grouping those images into classes of standard views. Although deep learning techniques have been successful in achieving this,… ▽ More

    Submitted 1 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Presented at ASMUS - MICCAI conference 2023, Vancouver

  7. arXiv:2402.12008  [pdf, other

    cs.LG cs.AI stat.ML

    Cluster Metric Sensitivity to Irrelevant Features

    Authors: Miles McCrory, Spencer A. Thomas

    Abstract: Clustering algorithms are used extensively in data analysis for data exploration and discovery. Technological advancements lead to continually growth of data in terms of volume, dimensionality and complexity. This provides great opportunities in data analytics as the data can be interrogated for many different purposes. This however leads challenges, such as identification of relevant features for… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  8. arXiv:2402.09390  [pdf, other

    cs.AI cs.CL

    HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation

    Authors: Yihao Fang, Stephen W. Thomas, Xiaodan Zhu

    Abstract: With the widespread adoption of large language models (LLMs) in numerous applications, the challenge of factuality and the propensity for hallucinations has emerged as a significant concern. To address this issue, particularly in retrieval-augmented in-context learning, we introduce the hierarchical graph of thoughts (HGOT), a structured, multi-layered graph approach designed to enhance the retrie… ▽ More

    Submitted 2 July, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  9. arXiv:2401.02843  [pdf, other

    cs.CY cs.AI cs.LG

    Thousands of AI Authors on the Future of AI

    Authors: Katja Grace, Harlan Stewart, Julia Fabienne Sandkühler, Stephen Thomas, Ben Weinstein-Raun, Jan Brauner

    Abstract: In the largest survey of its kind, 2,778 researchers who had published in top-tier artificial intelligence (AI) venues gave predictions on the pace of AI progress and the nature and impacts of advanced AI systems The aggregate forecasts give at least a 50% chance of AI systems achieving several milestones by 2028, including autonomously constructing a payment processing site from scratch, creating… ▽ More

    Submitted 30 April, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: The asterisk indicates the corresponding author. The dagger indicates equal contribution

  10. arXiv:2312.12972  [pdf, other

    cs.LG

    From Past to Future: Rethinking Eligibility Traces

    Authors: Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva

    Abstract: In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation. First, we delve into the nuances of eligibility traces and explore instances where their updates may result in unexpected credit assignment to preceding states. From this investigation emerges the concept of a novel value function, which we refer to as the \emph{bidirectional value functio… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted in The 38th Annual AAAI Conference on Artificial Intelligence

  11. arXiv:2310.19007  [pdf, other

    cs.LG

    Behavior Alignment via Reward Function Optimization

    Authors: Dhawal Gupta, Yash Chandak, Scott M. Jordan, Philip S. Thomas, Bruno Castro da Silva

    Abstract: Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadvertently inducing undesirable behaviors. Naively modifying the reward structure to offer denser and more frequent feedback can lead to unintended outco… ▽ More

    Submitted 31 October, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

    Comments: (Spotlight) Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)

  12. arXiv:2310.15358  [pdf, other

    cs.LG cs.CY stat.ML

    Learning Fair Representations with High-Confidence Guarantees

    Authors: Yuhong Luo, Austin Hoag, Philip S. Thomas

    Abstract: Representation learning is increasingly employed to generate representations that are predictive across multiple downstream tasks. The development of representation learning algorithms that provide strong fairness guarantees is thus important because it can prevent unfairness towards disadvantaged groups for all downstream prediction tasks. To prevent unfairness towards disadvantaged groups in all… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  13. arXiv:2310.01210  [pdf, other

    eess.IV cs.CV cs.LG

    Towards Robust Cardiac Segmentation using Graph Convolutional Networks

    Authors: Gilles Van De Vyver, Sarina Thomas, Guy Ben-Yosef, Sindre Hellum Olaisen, Håvard Dalen, Lasse Løvstakken, Erik Smistad

    Abstract: Fully automatic cardiac segmentation can be a fast and reproducible method to extract clinical measurements from an echocardiography examination. The U-Net architecture is the current state-of-the-art deep learning architecture for medical segmentation and can segment cardiac structures in real-time with average errors comparable to inter-observer variability. However, this architecture still gene… ▽ More

    Submitted 2 July, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  14. arXiv:2308.13517  [pdf, other

    cs.CL cs.AI

    ChatGPT as Data Augmentation for Compositional Generalization: A Case Study in Open Intent Detection

    Authors: Yihao Fang, Xianzhi Li, Stephen W. Thomas, Xiaodan Zhu

    Abstract: Open intent detection, a crucial aspect of natural language understanding, involves the identification of previously unseen intents in user-generated text. Despite the progress made in this field, challenges persist in handling new combinations of language components, which is essential for compositional generalization. In this paper, we present a case study exploring the use of ChatGPT as a data… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Journal ref: Proceedings of the Joint Workshop of the 5th Financial Technology and Natural Language Processing (FinNLP) and 2nd Multimodal AI For Financial Forecasting (Muffin), Macao, August 20, 2023

  15. Large Language Models to Identify Social Determinants of Health in Electronic Health Records

    Authors: Marco Guevara, Shan Chen, Spencer Thomas, Tafadzwa L. Chaunzwa, Idalid Franco, Benjamin Kann, Shalini Moningi, Jack Qian, Madeleine Goldstein, Susan Harper, Hugo JWL Aerts, Guergana K. Savova, Raymond H. Mak, Danielle S. Bitterman

    Abstract: Social determinants of health (SDoH) have an important impact on patient outcomes but are incompletely collected from the electronic health records (EHR). This study researched the ability of large language models to extract SDoH from free text in EHRs, where they are most commonly documented, and explored the role of synthetic clinical text for improving the extraction of these scarcely documente… ▽ More

    Submitted 5 March, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: Peer-reviewed version published at NPJ Digital Medicine: https://www.nature.com/articles/s41746-023-00970-0

    Journal ref: NPJ Digit Med. 2024 Jan 11;7(1):6

  16. arXiv:2305.16717  [pdf, other

    eess.IV cs.CV cs.LG

    Shape-based pose estimation for automatic standard views of the knee

    Authors: Lisa Kausch, Sarina Thomas, Holger Kunze, Jan Siad El Barbari, Klaus Maier-Hein

    Abstract: Surgical treatment of complicated knee fractures is guided by real-time imaging using a mobile C-arm. Immediate and continuous control is achieved via 2D anatomy-specific standard views that correspond to a specific C-arm pose relative to the patient positioning, which is currently determined manually, following a trial-and-error approach at the cost of time and radiation dose. The characteristics… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  17. arXiv:2305.12606  [pdf, other

    cs.CL cs.SD eess.AS

    Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

    Authors: Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass

    Abstract: Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each. However, there are thousands of spoken languages worldwide, and adapting to new languages is an important problem. In this work, we aim to understand which model adapts better to languages unseen during pre-training. We fine-tune both mo… ▽ More

    Submitted 30 May, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted at Interspeech 2023

  18. arXiv:2305.09838  [pdf, other

    cs.LG cs.AI

    Coagent Networks: Generalized and Scaled

    Authors: James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas

    Abstract: Coagent networks for reinforcement learning (RL) [Thomas and Barto, 2011] provide a powerful and flexible framework for deriving principled learning rules for arbitrary stochastic neural networks. The coagent framework offers an alternative to backpropagation-based deep learning (BDL) that overcomes some of backpropagation's main limitations. For example, coagent networks can compute different par… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

  19. FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2

    Authors: Kohav Dey, Krishna Bajaj, K S Ramalakshmi, Samuel Thomas, Sriram Radhakrishna

    Abstract: Marine ecosystems are vital for the planet's health, but human activities such as climate change, pollution, and overfishing pose a constant threat to marine species. Accurate classification and monitoring of these species can aid in understanding their distribution, population dynamics, and the impact of human activities on them. However, classifying marine species can be challenging due to their… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  20. arXiv:2303.16990  [pdf, other

    cs.CV

    What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

    Authors: Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogerio Feris, James Glass, Hilde Kuehne

    Abstract: Spatio-temporal grounding describes the task of localizing events in space and time, e.g., in video data, based on verbal descriptions only. Models for this task are usually trained with human-annotated sentences and bounding box supervision. This work addresses this task from a multimodal supervision perspective, proposing a framework for spatio-temporal action grounding trained on loose video an… ▽ More

    Submitted 28 May, 2024; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: To be presented at CVPR 2024. Project page: https://brian7685.github.io/STG/

  21. arXiv:2302.03161  [pdf, other

    cs.LG

    Optimization using Parallel Gradient Evaluations on Multiple Parameters

    Authors: Yash Chandak, Shiv Shankar, Venkata Gandikota, Philip S. Thomas, Arya Mazumdar

    Abstract: We propose a first-order method for convex optimization, where instead of being restricted to the gradient from a single parameter, gradients from multiple parameters can be used during each step of gradient descent. This setup is particularly useful when a few processors are available that can be used in parallel for optimization. Our method uses gradients from multiple parameters in synergy to u… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted at OPT workshop @ Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  22. arXiv:2301.10330  [pdf, other

    cs.LG cs.AI

    Off-Policy Evaluation for Action-Dependent Non-Stationary Environments

    Authors: Yash Chandak, Shiv Shankar, Nathaniel D. Bastian, Bruno Castro da Silva, Emma Brunskil, Philip S. Thomas

    Abstract: Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary. This limits the application of such methods because real-world problems are often subject to changes due to external factors (passive non-stationarity), changes induced by interactions with the system itself (active non-stationarity), or both (hybrid non-station… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: Accepted at Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  23. arXiv:2212.03932  [pdf, other

    cs.LG cs.AI

    Low Variance Off-policy Evaluation with State-based Importance Sampling

    Authors: David M. Bossens, Philip S. Thomas

    Abstract: In many domains, the exploration process of reinforcement learning will be too costly as it requires trying out suboptimal policies, resulting in a need for off-policy evaluation, in which a target policy is evaluated based on data collected from a known behaviour policy. In this context, importance sampling estimators provide estimates for the expected return by weighting the trajectory based on… ▽ More

    Submitted 4 May, 2024; v1 submitted 7 December, 2022; originally announced December 2022.

  24. arXiv:2210.14304  [pdf, other

    cs.CL q-fin.CP

    Learning Better Intent Representations for Financial Open Intent Classification

    Authors: Xianzhi Li, Will Aitken, Xiaodan Zhu, Stephen W. Thomas

    Abstract: With the recent surge of NLP technologies in the financial domain, banks and other financial entities have adopted virtual agents (VA) to assist customers. A challenging problem for VAs in this domain is determining a user's reason or intent for contacting the VA, especially when the intent was unseen or open during the VA's training. One method for handling open intents is adaptive decision bound… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: Accepted to FinNLP-2022, in conjunction with EMNLP-2022

  25. arXiv:2210.03625  [pdf, other

    cs.CL cs.CV cs.MM

    C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

    Authors: Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass

    Abstract: Multilingual text-video retrieval methods have improved significantly in recent years, but the performance for other languages lags behind English. We propose a Cross-Lingual Cross-Modal Knowledge Distillation method to improve multilingual text-video retrieval. Inspired by the fact that English text-video retrieval outperforms other languages, we train a student model using input text in differen… ▽ More

    Submitted 9 May, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: Accepted at ICASSP 2023. The code, models, and dataset are available at https://github.com/roudimit/c2kd

  26. arXiv:2209.01779  [pdf, other

    eess.IV cs.CV cs.LG

    Representation Learning for Non-Melanoma Skin Cancer using a Latent Autoencoder

    Authors: Simon Myles Thomas

    Abstract: Generative learning is a powerful tool for representation learning, and shows particular promise for problems in biomedical imaging. However, in this context, sampling from the distribution is secondary to finding representations of real images, which often come with labels and explicitly represent the content and quality of the target distribution. It remains difficult to faithfully reconstruct i… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

    Comments: 5 figures, 11 pages

  27. arXiv:2208.11744  [pdf, other

    cs.LG cs.AI cs.CY

    Enforcing Delayed-Impact Fairness Guarantees

    Authors: Aline Weber, Blossom Metevier, Yuriy Brun, Philip S. Thomas, Bruno Castro da Silva

    Abstract: Recent research has shown that seemingly fair machine learning models, when used to inform decisions that have an impact on peoples' lives or well-being (e.g., applications involving education, employment, and lending), can inadvertently increase social inequality in the long term. This is because prior fairness-aware algorithms only consider static fairness constraints, such as equal opportunity… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: 24 pages, 5 figures

  28. arXiv:2208.03528  [pdf, other

    cs.CR

    MetaEmu: An Architecture Agnostic Rehosting Framework for Automotive Firmware

    Authors: Zitai Chen, Sam L. Thomas, Flavio D. Garcia

    Abstract: In this paper we present MetaEmu, an architecture-agnostic emulator synthesizer geared towards rehosting and security analysis of automotive firmware. MetaEmu improves over existing rehosting environments in two ways: Firstly, it solves the hitherto open-problem of a lack of generic Virtual Execution Environments (VXEs) for rehosting by synthesizing processor simulators from Ghidra's language defi… ▽ More

    Submitted 6 August, 2022; originally announced August 2022.

  29. arXiv:2207.13965  [pdf, other

    eess.AS cs.SD

    Extending RNN-T-based speech recognition systems with emotion and language classification

    Authors: Zvi Kons, Hagai Aronowitz, Edmilson Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas, George Saon

    Abstract: Speech transcription, emotion recognition, and language identification are usually considered to be three different tasks. Each one requires a different model with a different architecture and training process. We propose using a recurrent neural network transducer (RNN-T)-based speech-to-text (STT) system as a common component that can be used for emotion recognition and language identification a… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

    Comments: Accepted for publication in Interspeech 2022

  30. Fast Data Driven Estimation of Cluster Number in Multiplex Images using Embedded Density Outliers

    Authors: Spencer A. Thomas

    Abstract: The usage of chemical imaging technologies is becoming a routine accompaniment to traditional methods in pathology. Significant technological advances have developed these next generation techniques to provide rich, spatially resolved, multidimensional chemical images. The rise of digital pathology has significantly enhanced the synergy of these imaging modalities with optical microscopy and immun… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

    Comments: 8 pages, 6 figures, conference paper

  31. arXiv:2207.05749  [pdf

    cs.LG cs.AI cs.CL cs.CV eess.IV

    Towards Highly Expressive Machine Learning Models of Non-Melanoma Skin Cancer

    Authors: Simon M. Thomas, James G. Lefevre, Glenn Baxter, Nicholas A. Hamilton

    Abstract: Pathologists have a rich vocabulary with which they can describe all the nuances of cellular morphology. In their world, there is a natural pairing of images and words. Recent advances demonstrate that machine learning models can now be trained to learn high-quality image features and represent them as discrete units of information. This enables natural language, which is also discrete, to be join… ▽ More

    Submitted 9 July, 2022; originally announced July 2022.

    Comments: 12 figures, 29 pages

    ACM Class: I.2.7; I.2.10

  32. arXiv:2207.02549  [pdf, other

    cs.CV

    Light-weight spatio-temporal graphs for segmentation and ejection fraction prediction in cardiac ultrasound

    Authors: Sarina Thomas, Andrew Gilbert, Guy Ben-Yosef

    Abstract: Accurate and consistent predictions of echocardiography parameters are important for cardiovascular diagnosis and treatment. In particular, segmentations of the left ventricle can be used to derive ventricular volume, ejection fraction (EF) and other relevant measurements. In this paper we propose a new automated method called EchoGraphs for predicting ejection fraction and segmenting the left ven… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: Accepted to MICCAI 2022

  33. arXiv:2206.13910  [pdf, other

    q-bio.PE cs.LG math.OC physics.soc-ph

    Epidemic Control Modeling using Parsimonious Models and Markov Decision Processes

    Authors: Edilson F. Arruda, Tarun Sharma, Rodrigo e A. Alexandre, Sinnu Susan Thomas

    Abstract: Many countries have experienced at least two waves of the COVID-19 pandemic. The second wave is far more dangerous as distinct strains appear more harmful to human health, but it stems from the complacency about the first wave. This paper introduces a parsimonious yet representative stochastic epidemic model that simulates the uncertain spread of the disease regardless of the latency and recovery… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

  34. arXiv:2206.09022  [pdf, other

    cs.LG cs.AI

    Designing MacPherson Suspension Architectures using Bayesian Optimization

    Authors: Sinnu Susan Thomas, Jacopo Palandri, Mohsen Lakehal-ayat, Punarjay Chakravarty, Friedrich Wolf-Monheim, Matthew B. Blaschko

    Abstract: Engineering design is traditionally performed by hand: an expert makes design proposals based on past experience, and these proposals are then tested for compliance with certain target specifications. Testing for compliance is performed first by computer simulation using what is called a discipline model. Such a model can be implemented by a finite element analysis, multibody systems approach, etc… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: 15 pages, 16 figures

  35. arXiv:2206.02380  [pdf, other

    cs.LG cs.AI

    Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL

    Authors: Abhinav Bhatia, Philip S. Thomas, Shlomo Zilberstein

    Abstract: Model-based reinforcement learning promises to learn an optimal policy from fewer interactions with the environment compared to model-free reinforcement learning by learning an intermediate model of the environment in order to predict future interactions. When predicting a sequence of interactions, the rollout length, which limits the prediction horizon, is a critical hyperparameter as accuracy of… ▽ More

    Submitted 7 June, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

  36. arXiv:2204.05188  [pdf, other

    cs.CL cs.SD eess.AS

    Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems

    Authors: Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas, Hong-Kwang J. Kuo, Brian Kingsbury

    Abstract: Recent advances in End-to-End (E2E) Spoken Language Understanding (SLU) have been primarily due to effective pretraining of speech representations. One such pretraining paradigm is the distillation of semantic knowledge from state-of-the-art text-based models like BERT to speech encoder neural networks. This work is a step towards doing the same in a much more efficient and fine-grained manner whe… ▽ More

    Submitted 1 July, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: 5 pages, 2 figures

  37. arXiv:2204.05169  [pdf, other

    cs.CL cs.AI

    Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding

    Authors: Vishal Sunder, Samuel Thomas, Hong-Kwang J. Kuo, Jatin Ganhotra, Brian Kingsbury, Eric Fosler-Lussier

    Abstract: Dialog history plays an important role in spoken language understanding (SLU) performance in a dialog system. For end-to-end (E2E) SLU, previous work has used dialog history in text form, which makes the model dependent on a cascaded automatic speech recognizer (ASR). This rescinds the benefits of an E2E system which is intended to be compact and robust to ASR errors. In this paper, we propose a h… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

    Comments: 5 pages, 1 figure

  38. arXiv:2203.00006  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems

    Authors: Samuel Thomas, Hong-Kwang J. Kuo, Brian Kingsbury, George Saon

    Abstract: The lack of speech data annotated with labels required for spoken language understanding (SLU) is often a major hurdle in building end-to-end (E2E) systems that can directly process speech inputs. In contrast, large amounts of text data with suitable labels are usually available. In this paper, we propose a novel text representation and training methodology that allows E2E SLU systems to be effect… ▽ More

    Submitted 26 February, 2022; originally announced March 2022.

    Comments: \c{opyright}2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. arXiv admin note: text overlap with arXiv:2202.13155

  39. arXiv:2202.13155  [pdf, other

    cs.CL cs.SD eess.AS

    Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models

    Authors: Samuel Thomas, Brian Kingsbury, George Saon, Hong-Kwang J. Kuo

    Abstract: Compared to hybrid automatic speech recognition (ASR) systems that use a modular architecture in which each component can be independently adapted to a new domain, recent end-to-end (E2E) ASR system are harder to customize due to their all-neural monolithic construction. In this paper, we propose a novel text representation and training framework for E2E ASR models. With this approach, we show tha… ▽ More

    Submitted 26 February, 2022; originally announced February 2022.

    Comments: \c{opyright}2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  40. arXiv:2202.10137  [pdf, other

    cs.CL eess.AS

    A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets

    Authors: Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas, Boaz Carmeli, Ron Hoory, Brian Kingsbury

    Abstract: Intent classifiers are vital to the successful operation of virtual agent systems. This is especially so in voice activated systems where the data can be noisy with many ambiguous directions for user intents. Before operation begins, these classifiers are generally lacking in real-world training data. Active learning is a common approach used to help label large amounts of collected user input. Ho… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

    Comments: \c{opyright} 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  41. arXiv:2202.06914  [pdf, other

    cs.LG cs.CV

    A Generic Self-Supervised Framework of Learning Invariant Discriminative Features

    Authors: Foivos Ntelemis, Yaochu Jin, Spencer A. Thomas

    Abstract: Self-supervised learning (SSL) has become a popular method for generating invariant representations without the need for human annotations. Nonetheless, the desired invariant representation is achieved by utilising prior online transformation functions on the input data. As a result, each SSL framework is customised for a particular data type, e.g., visual data, and further modifications are requi… ▽ More

    Submitted 21 August, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

  42. arXiv:2201.12105  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Improving End-to-End Models for Set Prediction in Spoken Language Understanding

    Authors: Hong-Kwang J. Kuo, Zoltan Tuske, Samuel Thomas, Brian Kingsbury, George Saon

    Abstract: The goal of spoken language understanding (SLU) systems is to determine the meaning of the input speech signal, unlike speech recognition which aims to produce verbatim transcripts. Advances in end-to-end (E2E) speech modeling have made it possible to train solely on semantic entities, which are far cheaper to collect than verbatim transcripts. We focus on this set prediction problem, where entity… ▽ More

    Submitted 28 January, 2022; originally announced January 2022.

    Comments: ICASSP \c{opyright}2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    ACM Class: I.2.7

  43. arXiv:2112.14681  [pdf, other

    math.NA cs.MS

    Neumann Series in GMRES and Algebraic Multigrid Smoothers

    Authors: Stephen Thomas, Arielle Carr, Paul Mullowney, Ruipeng Li, Kasia Świrydowicz

    Abstract: Neumann series underlie both Krylov methods and algebraic multigrid smoothers. A low-synch modified Gram-Schmidt (MGS)-GMRES algorithm is described that employs a Neumann series to accelerate the projection step. A corollary to the backward stability result of Paige et al. (2006) demonstrates that the truncated Neumann series approximation is sufficient for convergence of GMRES. The lower triangul… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

  44. arXiv:2112.08597  [pdf

    cs.RO

    Reprogrammable Surfaces Through Star Graph Metamaterials

    Authors: Sawyer Thomas, Jeffrey Lipton

    Abstract: The ability to change a surface's profile allows biological systems to effectively manipulate and blend into their surroundings. Current surface morphing techniques rely either on having a small number of fixed states or on directly driving the entire system. We discovered a subset of scale-independent auxetic metamaterials have a state trajectory with a star-graph structure. At the central node,… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

  45. arXiv:2112.05812  [pdf, other

    cs.LG

    Edge-Compatible Reinforcement Learning for Recommendations

    Authors: James E. Kostas, Philip S. Thomas, Georgios Theocharous

    Abstract: Most reinforcement learning (RL) recommendation systems designed for edge computing must either synchronize during recommendation selection or depend on an unprincipled patchwork collection of algorithms. In this work, we build on asynchronous coagent policy gradient algorithms \citep{kostas2020asynchronous} to propose a principled solution to this problem. The class of algorithms that we propose… ▽ More

    Submitted 10 August, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

  46. arXiv:2112.04446  [pdf, other

    cs.CV cs.CL cs.SD eess.AS

    Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

    Authors: Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Hilde Kuehne

    Abstract: Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation enabling tasks like zero-shot retrieval and classification. In this work, we present a multi-modal, modality agnostic fusion transformer approach that learns to exchange information between multiple modalities, such as video, audio, and text,… ▽ More

    Submitted 18 August, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: CVPR2022. The final published version of the proceedings will be available on IEEE Xplore

  47. arXiv:2112.00775  [pdf, other

    cs.CV

    Routing with Self-Attention for Multimodal Capsule Networks

    Authors: Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah

    Abstract: The task of multimodal learning has seen a growing interest recently as it allows for training neural architectures based on different modalities such as vision, text, and audio. One challenge in training such models is that they need to jointly learn semantic concepts and their relationships across different input representations. Capsule networks have been shown to perform well in context of cap… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

  48. arXiv:2111.14388  [pdf, other

    eess.IV cs.CV cs.LG

    Enhanced Transfer Learning Through Medical Imaging and Patient Demographic Data Fusion

    Authors: Spencer A. Thomas

    Abstract: In this work we examine the performance enhancement in classification of medical imaging data when image features are combined with associated non-image data. We compare the performance of eight state-of-the-art deep neural networks in classification tasks when using only image features, compared to when these are combined with patient metadata. We utilise transfer learning with networks pretraine… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

  49. arXiv:2111.09512  [pdf, other

    math.NA cs.MS

    ILU Smoothers for Low Mach Navier-Stokes Pressure Solvers

    Authors: Stephen Thomas, Arielle Carr, Paul Mullowney, Kasia Świrydowicz, Marc Day

    Abstract: Incomplete LU (ILU) smoothers are effective in the algebraic multigrid (AMG) $V$-cycle for reducing high-frequency components of the error. However, the requisite direct triangular solves are comparatively slow on GPUs. Previous work has demonstrated the advantages of Jacobi iteration as an alternative to direct solution of these systems. Depending on the threshold and fill-level parameters chosen… ▽ More

    Submitted 27 November, 2023; v1 submitted 17 November, 2021; originally announced November 2021.

    Comments: v2 updated citation information; v3 updated results; v4 abstract updated, new results added; v5 new experimental analysis and results added; v6 shortened theory, improved discussion of applications; v7 final version

  50. arXiv:2111.04823  [pdf, other

    cs.CL cs.CV cs.MM cs.SD eess.AS eess.IV

    Cascaded Multilingual Audio-Visual Learning from Videos

    Authors: Andrew Rouditchenko, Angie Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, James Glass

    Abstract: In this paper, we explore self-supervised audio-visual models that learn from instructional videos. Prior work has shown that these models can relate spoken words and sounds to visual content after training on a large-scale dataset of videos, but they were only trained and evaluated on videos in English. To learn multilingual audio-visual representations, we propose a cascaded approach that levera… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: Presented at Interspeech 2021. This version contains updated results using the YouCook-Japanese dataset