Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 101 results for author: Maa, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.07140  [pdf, other

    cs.LG stat.ML

    Cardinality-Aware Set Prediction and Top-$k$ Classification

    Authors: Corinna Cortes, Anqi Mao, Christopher Mohri, Mehryar Mohri, Yutao Zhong

    Abstract: We present a detailed study of cardinality-aware top-$k$ classification, a novel approach that aims to learn an accurate top-$k$ set predictor while maintaining a low cardinality. We introduce a new target loss function tailored to this setting that accounts for both the classification error and the cardinality of the set predicted. To optimize this loss function, we propose two families of surrog… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2403.19625

  2. arXiv:2407.03600  [pdf, other

    cs.CL

    Contrastive Chain-of-Thought Prompting

    Authors: Grant Kruttschnitt, Jay Shim, Alyssa Ma, Daniel Kim, Benjamin Chek, Athul Anand, Kevin Zhu, Sean O'Brien

    Abstract: Rapidly increasing model scales coupled with steering methods such as chain-of-thought prompting have led to drastic improvements in language model reasoning. At the same time, models struggle with compositional generalization and are far from human performance on many reasoning-based benchmarks. Leveraging the success of chain-of-thought prompting, and also taking inspiration from context-aware d… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 6 pages, 0 figures

  3. arXiv:2406.17319  [pdf, other

    cs.CV

    DMF-Net: Image-Guided Point Cloud Completion with Dual-Channel Modality Fusion and Shape-Aware Upsampling Transformer

    Authors: Aihua Mao, Yuxuan Tang, Jiangtao Huang, Ying He

    Abstract: In this paper we study the task of a single-view image-guided point cloud completion. Existing methods have got promising results by fusing the information of image into point cloud explicitly or implicitly. However, given that the image has global shape information and the partial point cloud has rich local details, We believe that both modalities need to be given equal attention when performing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  4. Single-Temporal Supervised Learning for Universal Remote Sensing Change Detection

    Authors: Zhuo Zheng, Yanfei Zhong, Ailong Ma, Liangpei Zhang

    Abstract: Bitemporal supervised learning paradigm always dominates remote sensing change detection using numerous labeled bitemporal image pairs, especially for high spatial resolution (HSR) remote sensing imagery. However, it is very expensive and labor-intensive to label change regions in large-scale bitemporal HSR remote sensing image pairs. In this paper, we propose single-temporal supervised learning (… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: IJCV 2024. arXiv admin note: text overlap with arXiv:2108.07002

  5. arXiv:2406.10215  [pdf, other

    cs.CL cs.LG

    DevBench: A multimodal developmental benchmark for language learning

    Authors: Alvin Wei Ming Tan, Sunny Yu, Bria Long, Wanjing Anya Ma, Tonya Murray, Rebecca D. Silverman, Jason D. Yeatman, Michael C. Frank

    Abstract: How (dis)similar are the learning trajectories of vision-language models and children? Recent modeling work has attempted to understand the gap between models' and humans' data efficiency by constructing models trained on less data, especially multimodal naturalistic data. However, such models are often evaluated on adult-level benchmarks, with limited breadth in language abilities tested, and wit… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  6. arXiv:2405.07905  [pdf, other

    eess.IV cs.CV

    PLUTO: Pathology-Universal Transformer

    Authors: Dinkar Juyal, Harshith Padigela, Chintan Shah, Daniel Shenker, Natalia Harguindeguy, Yi Liu, Blake Martin, Yibo Zhang, Michael Nercessian, Miles Markey, Isaac Finberg, Kelsey Luu, Daniel Borders, Syed Ashar Javed, Emma Krause, Raymond Biju, Aashish Sood, Allen Ma, Jackson Nyman, John Shamshoian, Guillaume Chhor, Darpan Sanghavi, Marc Thibault, Limin Yu, Fedaa Najdawi , et al. (8 additional authors not shown)

    Abstract: Pathology is the study of microscopic inspection of tissue, and a pathology diagnosis is often the medical gold standard to diagnose disease. Pathology images provide a unique challenge for computer-vision-based analysis: a single pathology Whole Slide Image (WSI) is gigapixel-sized and often contains hundreds of thousands to millions of objects of interest across multiple resolutions. In this wor… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  7. arXiv:2405.05968  [pdf, other

    cs.LG stat.ML

    A Universal Growth Rate for Learning with Smooth Surrogate Losses

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: This paper presents a comprehensive analysis of the growth rate of $H$-consistency bounds (and excess error bounds) for various surrogate losses used in classification. We prove a square-root growth rate near zero for smooth margin-based surrogate losses in binary classification, providing both upper and lower bounds under mild assumptions. This result also translates to excess error bounds. Our l… ▽ More

    Submitted 8 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  8. arXiv:2403.19625  [pdf, other

    cs.LG stat.ML

    Top-$k$ Classification and Cardinality-Aware Prediction

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: We present a detailed study of top-$k$ classification, the task of predicting the $k$ most probable classes for an input, extending beyond single-class prediction. We demonstrate that several prevalent surrogate loss functions in multi-class classification, such as comp-sum and constrained losses, are supported by $H$-consistency bounds with respect to the top-$k$ loss. These bounds guarantee cons… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  9. arXiv:2403.19494  [pdf, ps, other

    cs.LG stat.ML

    Regression with Multi-Expert Deferral

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: Learning to defer with multiple experts is a framework where the learner can choose to defer the prediction to several experts. While this problem has received significant attention in classification contexts, it presents unique challenges in regression due to the infinite and continuous nature of the label space. In this work, we introduce a novel framework of regression with deferral, which invo… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  10. arXiv:2403.19480  [pdf, ps, other

    cs.LG stat.ML

    $H$-Consistency Guarantees for Regression

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: We present a detailed study of $H$-consistency bounds for regression. We first present new theorems that generalize the tools previously given to establish $H$-consistency bounds. This generalization proves essential for analyzing $H$-consistency bounds specific to regression. Next, we prove a series of novel $H$-consistency bounds for surrogate loss functions of the squared loss, under the assump… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  11. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  12. arXiv:2403.00892  [pdf, other

    eess.SY cs.LG

    PowerFlowMultiNet: Multigraph Neural Networks for Unbalanced Three-Phase Distribution Systems

    Authors: Salah Ghamizi, Jun Cao, Aoxiang Ma, Pedro Rodriguez

    Abstract: Efficiently solving unbalanced three-phase power flow in distribution grids is pivotal for grid analysis and simulation. There is a pressing need for scalable algorithms capable of handling large-scale unbalanced power grids that can provide accurate and fast solutions. To address this, deep learning techniques, especially Graph Neural Networks (GNNs), have emerged. However, existing literature pr… ▽ More

    Submitted 12 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  13. arXiv:2402.18078  [pdf, other

    cs.CV

    Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

    Authors: Yanzuo Lu, Manlin Zhang, Andy J Ma, Xiaohua Xie, Jian-Huang Lai

    Abstract: Diffusion model is a promising approach to image generation and has been employed for Pose-Guided Person Image Synthesis (PGPIS) with competitive performance. While existing methods simply align the person appearance to the target pose, they are prone to overfitting due to the lack of a high-level semantic understanding on the source person image. In this paper, we propose a novel Coarse-to-Fine L… ▽ More

    Submitted 9 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted by CVPR 2024 (Highlight)

  14. arXiv:2402.10434  [pdf, other

    cs.LG

    Parametric Augmentation for Time Series Contrastive Learning

    Authors: Xu Zheng, Tianchun Wang, Wei Cheng, Aitian Ma, Haifeng Chen, Mo Sha, Dongsheng Luo

    Abstract: Modern techniques like contrastive learning have been effectively used in many areas, including computer vision, natural language processing, and graph-structured data. Creating positive examples that assist the model in learning robust and discriminative representations is a crucial stage in contrastive learning approaches. Usually, preset human intuition directs the selection of relevant data au… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted by International Conference on Learning Representations (ICLR 2024)

  15. arXiv:2401.16450  [pdf, other

    cs.HC cs.AI cs.SE

    ACCESS: Prompt Engineering for Automated Web Accessibility Violation Corrections

    Authors: Calista Huang, Alyssa Ma, Suchir Vyasamudri, Eugenie Puype, Sayem Kamal, Juan Belza Garcia, Salar Cheema, Michael Lutz

    Abstract: With the increasing need for inclusive and user-friendly technology, web accessibility is crucial to ensuring equal access to online content for individuals with disabilities, including visual, auditory, cognitive, or motor impairments. Despite the existence of accessibility guidelines and standards such as Web Content Accessibility Guidelines (WCAG) and the Web Accessibility Initiative (W3C), ove… ▽ More

    Submitted 10 February, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

    Comments: 11 pages, 6 figures

  16. arXiv:2401.16348  [pdf, other

    cs.CL cs.CY cs.HC

    Improving the TENOR of Labeling: Re-evaluating Topic Models for Content Analysis

    Authors: Zongxia Li, Andrew Mao, Daniel Stephens, Pranav Goel, Emily Walpole, Alden Dima, Juan Fung, Jordan Boyd-Graber

    Abstract: Topic models are a popular tool for understanding text collections, but their evaluation has been a point of contention. Automated evaluation metrics such as coherence are often used, however, their validity has been questioned for neural topic models (NTMs) and can overlook a models benefits in real world applications. To this end, we conduct the first evaluation of neural, supervised and classic… ▽ More

    Submitted 19 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 19 pages, 5 tables, 6 figures, Accepted to EACL Main Conference 2024

  17. arXiv:2312.12246  [pdf, other

    cs.CV cs.LG

    MDD-UNet: Domain Adaptation for Medical Image Segmentation with Theoretical Guarantees, a Proof of Concept

    Authors: Asbjørn Munk, Ao Ma, Mads Nielsen

    Abstract: The current state-of-the art techniques for image segmentation are often based on U-Net architectures, a U-shaped encoder-decoder networks with skip connections. Despite the powerful performance, the architecture often does not perform well when used on data which has different characteristics than the data it was trained on. Many techniques for improving performance in the presence of domain shif… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Published at NLDL 2024

  18. arXiv:2312.12222  [pdf, other

    cs.CV

    EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering

    Authors: Junjue Wang, Zhuo Zheng, Zihang Chen, Ailong Ma, Yanfei Zhong

    Abstract: Earth vision research typically focuses on extracting geospatial object locations and categories but neglects the exploration of relations between objects and comprehensive reasoning. Based on city planning needs, we develop a multi-modal multi-task VQA dataset (EarthVQA) to advance relational reasoning-based judging, counting, and comprehensive analysis. The EarthVQA dataset contains 6000 images,… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted By AAAI 2024

  19. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  20. arXiv:2312.11468  [pdf, other

    physics.med-ph cs.CV

    Bias-Reduced Neural Networks for Parameter Estimation in Quantitative MRI

    Authors: Andrew Mao, Sebastian Flassbeck, Jakob Assländer

    Abstract: Purpose: To develop neural network (NN)-based quantitative MRI parameter estimators with minimal bias and a variance close to the Cramér-Rao bound. Theory and Methods: We generalize the mean squared error loss to control the bias and variance of the NN's estimates, which involves averaging over multiple noise realizations of the same measurements during training. Bias and variance properties of… ▽ More

    Submitted 10 April, 2024; v1 submitted 13 November, 2023; originally announced December 2023.

  21. arXiv:2312.07871  [pdf, other

    cs.CV

    MLNet: Mutual Learning Network with Neighborhood Invariance for Universal Domain Adaptation

    Authors: Yanzuo Lu, Meng Shen, Andy J Ma, Xiaohua Xie, Jian-Huang Lai

    Abstract: Universal domain adaptation (UniDA) is a practical but challenging problem, in which information about the relation between the source and the target domains is not given for knowledge transfer. Existing UniDA methods may suffer from the problems of overlooking intra-domain variations in the target domain and difficulty in separating between the similar known and unknown class. To address these is… ▽ More

    Submitted 27 February, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024 (Poster)

  22. arXiv:2312.00111  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Multimodal Learning for Materials

    Authors: Viggo Moro, Charlotte Loh, Rumen Dangovski, Ali Ghorashi, Andrew Ma, Zhuo Chen, Samuel Kim, Peter Y. Lu, Thomas Christensen, Marin Soljačić

    Abstract: Artificial intelligence is transforming computational materials science, improving the prediction of material properties, and accelerating the discovery of novel materials. Recently, publicly available material data repositories have grown rapidly. This growth encompasses not only more materials, but also a greater variety and quantity of their associated properties. Existing machine learning effo… ▽ More

    Submitted 12 April, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: 11 pages, 4 figures

  23. arXiv:2311.18495  [pdf, other

    cs.LG cs.CV

    Improving Adversarial Transferability via Model Alignment

    Authors: Avery Ma, Amir-massoud Farahmand, Yangchen Pan, Philip Torr, Jindong Gu

    Abstract: Neural networks are susceptible to adversarial perturbations that are transferable across different models. In this paper, we introduce a novel model alignment technique aimed at improving a given source model's ability in generating transferable adversarial perturbations. During the alignment process, the parameters of the source model are fine-tuned to minimize an alignment loss. This loss measu… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  24. arXiv:2311.10266  [pdf, other

    cs.CL

    Diagnosing and Debiasing Corpus-Based Political Bias and Insults in GPT2

    Authors: Ambri Ma, Arnav Kumar, Brett Zeligson

    Abstract: The training of large language models (LLMs) on extensive, unfiltered corpora sourced from the internet is a common and advantageous practice. Consequently, LLMs have learned and inadvertently reproduced various types of biases, including violent, offensive, and toxic language. However, recent research shows that generative pretrained transformer (GPT) language models can recognize their own biase… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 9 pages

  25. arXiv:2311.02762   

    cs.CV cs.LG

    Fast Sparse 3D Convolution Network with VDB

    Authors: Fangjun Zhou, Anyong Mao, Eftychios Sifakis

    Abstract: We proposed a new Convolution Neural Network implementation optimized for sparse 3D data inference. This implementation uses NanoVDB as the data structure to store the sparse tensor. It leaves a relatively small memory footprint while maintaining high performance. We demonstrate that this architecture is around 20 times faster than the state-of-the-art dense CNN model on a high-resolution 3D objec… ▽ More

    Submitted 14 November, 2023; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: Unauthorized publication

  26. arXiv:2310.19859  [pdf, other

    cs.CV cs.AI

    Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone

    Authors: Zeyinzi Jiang, Chaojie Mao, Ziyuan Huang, Ao Ma, Yiliang Lv, Yujun Shen, Deli Zhao, Jingren Zhou

    Abstract: Parameter-efficient tuning has become a trend in transferring large-scale foundation models to downstream applications. Existing methods typically embed some light-weight tuners into the backbone, where both the design and the learning of the tuners are highly dependent on the base model. This work offers a new tuning paradigm, dubbed Res-Tuning, which intentionally unbinds tuners from the backbon… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  27. arXiv:2310.17626  [pdf, ps, other

    cs.CV

    A Survey on Transferability of Adversarial Examples across Deep Neural Networks

    Authors: Jindong Gu, Xiaojun Jia, Pau de Jorge, Wenqain Yu, Xinwei Liu, Avery Ma, Yuan Xun, Anjun Hu, Ashkan Khakzar, Zhijiang Li, Xiaochun Cao, Philip Torr

    Abstract: The emergence of Deep Neural Networks (DNNs) has revolutionized various domains by enabling the resolution of complex tasks spanning image recognition, natural language processing, and scientific problem-solving. However, this progress has also brought to light a concerning vulnerability: adversarial examples. These crafted inputs, imperceptible to humans, can manipulate machine learning models in… ▽ More

    Submitted 1 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted to Transactions on Machine Learning Research (TMLR)

  28. arXiv:2310.14774  [pdf, ps, other

    cs.LG stat.ML

    Principled Approaches for Learning to Defer with Multiple Experts

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: We present a study of surrogate losses and algorithms for the general problem of learning to defer with multiple experts. We first introduce a new family of surrogate losses specifically tailored for the multiple-expert setting, where the prediction and deferral functions are learned simultaneously. We then prove that these surrogate losses benefit from strong $H$-consistency bounds. We illustrate… ▽ More

    Submitted 31 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: ISAIM 2024

  29. arXiv:2310.14772  [pdf, other

    cs.LG stat.ML

    Predictor-Rejector Multi-Class Abstention: Theoretical Analysis and Algorithms

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: We study the key framework of learning with abstention in the multi-class classification setting. In this setting, the learner can choose to abstain from making a prediction with some pre-defined cost. We present a series of new theoretical and algorithmic results for this learning problem in the predictor-rejector framework. We introduce several new families of surrogate losses for which we prove… ▽ More

    Submitted 31 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: ALT 2024

  30. arXiv:2310.14770  [pdf, ps, other

    cs.LG stat.ML

    Theoretically Grounded Loss Functions and Algorithms for Score-Based Multi-Class Abstention

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: Learning with abstention is a key scenario where the learner can abstain from making a prediction at some cost. In this paper, we analyze the score-based formulation of learning with abstention in the multi-class classification setting. We introduce new families of surrogate losses for the abstention loss function, which include the state-of-the-art surrogate losses in the single-stage setting and… ▽ More

    Submitted 31 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: AISTATS 2024

  31. arXiv:2310.06837  [pdf, other

    cs.CL cs.LG

    Generating and Evaluating Tests for K-12 Students with Language Model Simulations: A Case Study on Sentence Reading Efficiency

    Authors: Eric Zelikman, Wanjing Anya Ma, Jasmine E. Tran, Diyi Yang, Jason D. Yeatman, Nick Haber

    Abstract: Developing an educational test can be expensive and time-consuming, as each item must be written by experts and then evaluated by collecting hundreds of student responses. Moreover, many tests require multiple distinct sets of questions administered throughout the school year to closely monitor students' progress, known as parallel tests. In this study, we focus on tests of silent sentence reading… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 (Main)

  32. arXiv:2309.17031  [pdf, other

    cs.CV cs.AI

    Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change Process

    Authors: Zhuo Zheng, Shiqi Tian, Ailong Ma, Liangpei Zhang, Yanfei Zhong

    Abstract: Understanding the temporal dynamics of Earth's surface is a mission of multi-temporal remote sensing image analysis, significantly promoted by deep vision models with its fuel -- labeled multi-temporal images. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present a sca… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  33. arXiv:2309.03893  [pdf, other

    cs.CV cs.AI cs.LG

    DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection

    Authors: Manlin Zhang, Jie Wu, Yuxi Ren, Ming Li, Jie Qin, Xuefeng Xiao, Wei Liu, Rui Wang, Min Zheng, Andy J. Ma

    Abstract: Data is the cornerstone of deep learning. This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection. Existing methods for scaling up detection-oriented data often require manual collection or generative models to obtain target images, followed by data augmentation and labeling to produce training pairs, which are costly, complex, or lacking diver… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: Code and Models are publicly available. Project Page: https://mettyz.github.io/DiffusionEngine

  34. arXiv:2308.16904  [pdf, other

    math.NA cs.LG math.OC

    A Note on Randomized Kaczmarz Algorithm for Solving Doubly-Noisy Linear Systems

    Authors: El Houcine Bergou, Soumia Boucherouite, Aritra Dutta, Xin Li, Anna Ma

    Abstract: Large-scale linear systems, $Ax=b$, frequently arise in practice and demand effective iterative solvers. Often, these systems are noisy due to operational errors or faulty data-collection processes. In the past decade, the randomized Kaczmarz (RK) algorithm has been studied extensively as an efficient iterative solver for such systems. However, the convergence study of RK in the noisy regime is li… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    MSC Class: 15A06; 15A09; 15A10; 15A18; 65F10; 65Y20; 68Q25; 68W20; 68W40

  35. arXiv:2308.06703  [pdf, other

    cs.LG

    Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods

    Authors: Avery Ma, Yangchen Pan, Amir-massoud Farahmand

    Abstract: Stochastic gradient descent (SGD) and adaptive gradient methods, such as Adam and RMSProp, have been widely used in training deep neural networks. We empirically show that while the difference between the standard generalization performance of models trained using these methods is small, those trained using SGD exhibit far greater robustness under input perturbations. Notably, our investigation de… ▽ More

    Submitted 28 November, 2023; v1 submitted 13 August, 2023; originally announced August 2023.

    Comments: Accepted at TMLR (Featured Certification). Code: see https://github.com/averyma/opt-robust

  36. arXiv:2307.02035  [pdf, ps, other

    cs.LG stat.ML

    Ranking with Abstention

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: We introduce a novel framework of ranking with abstention, where the learner can abstain from making prediction at some limited cost $c$. We present a extensive theoretical analysis of this framework including a series of $H$-consistency bounds for both the family of linear functions and that of neural networks with one hidden-layer. These theoretical guarantees are the state-of-the-art consistenc… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  37. arXiv:2306.08838  [pdf, other

    cs.LG cs.CR stat.ML

    Differentially Private Domain Adaptation with Theoretical Guarantees

    Authors: Raef Bassily, Corinna Cortes, Anqi Mao, Mehryar Mohri

    Abstract: In many applications, the labeled data at the learner's disposal is subject to privacy constraints and is relatively limited. To derive a more accurate predictor for the target domain, it is often beneficial to leverage publicly available labeled data from an alternative domain, somewhat close to the target domain. This is the modern problem of supervised domain adaptation from a public source to… ▽ More

    Submitted 4 February, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

  38. arXiv:2306.04730  [pdf, other

    eess.SP cs.LG math.NA math.OC stat.ML

    Stochastic Natural Thresholding Algorithms

    Authors: Rachel Grotheer, Shuang Li, Anna Ma, Deanna Needell, Jing Qin

    Abstract: Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and disc… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  39. arXiv:2306.00357  [pdf, other

    stat.ML cs.HC cs.LG math.PR math.ST

    Efficient and Robust Bayesian Selection of Hyperparameters in Dimension Reduction for Visualization

    Authors: Yin-Ting Liao, Hengrui Luo, Anna Ma

    Abstract: We introduce an efficient and robust auto-tuning framework for hyperparameter selection in dimension reduction (DR) algorithms, focusing on large-scale datasets and arbitrary performance metrics. By leveraging Bayesian optimization (BO) with a surrogate model, our approach enables efficient hyperparameter selection with multi-objective trade-offs and allows us to perform data-driven sensitivity an… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 20 pages, 16 figures

    MSC Class: 62F15; 68T09; 94A16

  40. arXiv:2305.17358  [pdf, other

    cs.CL

    CTC-based Non-autoregressive Speech Translation

    Authors: Chen Xu, Xiaoqian Liu, Xiaowen Liu, Qingxuan Sun, Yuhao Zhang, Murun Yang, Qianqian Dong, Tom Ko, Mingxuan Wang, Tong Xiao, Anxiang Ma, Jingbo Zhu

    Abstract: Combining end-to-end speech translation (ST) and non-autoregressive (NAR) generation is promising in language and speech processing for their advantages of less error propagation and low latency. In this paper, we investigate the potential of connectionist temporal classification (CTC) for non-autoregressive speech translation (NAST). In particular, we develop a model consisting of two encoders th… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: ACL 2023 Main Conference

  41. arXiv:2305.17356  [pdf, other

    cs.CL

    Bridging the Granularity Gap for Acoustic Modeling

    Authors: Chen Xu, Yuhao Zhang, Chengbo Jiao, Xiaoqian Liu, Chi Hu, Xin Zeng, Tong Xiao, Anxiang Ma, Huizhen Wang, JingBo Zhu

    Abstract: While Transformer has become the de-facto standard for speech, modeling upon the fine-grained frame-level features remains an open challenge of capturing long-distance dependencies and distributing the attention weights. We propose \textit{Progressive Down-Sampling} (PDS) which gradually compresses the acoustic features into coarser-grained units containing more complete semantic information, like… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: ACL 2023 Findings

  42. arXiv:2305.05948  [pdf, other

    cs.CL cs.AI

    Multi-Path Transformer is Better: A Case Study on Neural Machine Translation

    Authors: Ye Lin, Shuhan Zhou, Yanyang Li, Anxiang Ma, Tong Xiao, Jingbo Zhu

    Abstract: For years the model performance in machine learning obeyed a power-law relationship with the model size. For the consideration of parameter efficiency, recent studies focus on increasing model depth rather than width to achieve better performance. In this paper, we study how model width affects the Transformer model through a parameter-efficient multi-path structure. To better fuse features extrac… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: accepted by EMNLP2022 Findings

  43. arXiv:2304.07288  [pdf, other

    cs.LG stat.ML

    Cross-Entropy Loss Functions: Theoretical Analysis and Applications

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: Cross-entropy is a widely used loss function in applications. It coincides with the logistic loss applied to the outputs of a neural network, when the softmax is used. But, what guarantees can we rely on when using cross-entropy as a surrogate loss? We present a theoretical analysis of a broad family of loss functions, comp-sum losses, that includes cross-entropy (or logistic loss), generalized cr… ▽ More

    Submitted 19 June, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

    Comments: ICML 2023

  44. arXiv:2303.00262  [pdf, other

    cs.CV cs.GR cs.LG

    Collage Diffusion

    Authors: Vishnu Sarukkai, Linden Li, Arden Ma, Christopher Ré, Kayvon Fatahalian

    Abstract: We seek to give users precise control over diffusion-based image generation by modeling complex scenes as sequences of layers, which define the desired spatial arrangement and visual attributes of objects in the scene. Collage Diffusion harmonizes the input layers to make objects fit together -- the key challenge involves minimizing changes in the positions and key visual attributes of the input l… ▽ More

    Submitted 31 August, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  45. arXiv:2212.10190  [pdf, other

    cs.CL

    Pay Attention to Your Tone: Introducing a New Dataset for Polite Language Rewrite

    Authors: Xun Wang, Tao Ge, Allen Mao, Yuki Li, Furu Wei, Si-Qing Chen

    Abstract: We introduce \textsc{PoliteRewrite} -- a dataset for polite language rewrite which is a novel sentence rewrite task. Compared with previous text style transfer tasks that can be mostly addressed by slight token- or phrase-level edits, polite language rewrite requires deep understanding and extensive sentence-level edits over an offensive and impolite sentence to deliver the same message euphemisti… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

  46. arXiv:2212.08583  [pdf, other

    cs.CV

    Semi-Siamese Network for Robust Change Detection Across Different Domains with Applications to 3D Printing

    Authors: Yushuo Niu, Ethan Chadwick, Anson W. K. Ma, Qian Yang

    Abstract: Automatic defect detection for 3D printing processes, which shares many characteristics with change detection problems, is a vital step for quality control of 3D printed products. However, there are some critical challenges in the current state of practice. First, existing methods for computer vision-based process monitoring typically work well only under specific camera viewpoints and lighting si… ▽ More

    Submitted 2 August, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

  47. arXiv:2212.03296  [pdf, other

    cs.CL cs.AI

    Cheater's Bowl: Human vs. Computer Search Strategies for Open-Domain Question Answering

    Authors: Wanrong He, Andrew Mao, Jordan Boyd-Graber

    Abstract: For humans and computers, the first step in answering an open-domain question is retrieving a set of relevant documents from a large corpus. However, the strategies that computers use fundamentally differ from those of humans. To better understand these differences, we design a gamified interface for data collection -- Cheater's Bowl -- where a human answers complex questions with access to both t… ▽ More

    Submitted 15 November, 2022; originally announced December 2022.

    Comments: Findings of EMNLP 2022

  48. arXiv:2211.00113  [pdf, other

    cs.LG cs.CV

    SAGE: Saliency-Guided Mixup with Optimal Rearrangements

    Authors: Avery Ma, Nikita Dvornik, Ran Zhang, Leila Pishdad, Konstantinos G. Derpanis, Afsaneh Fazly

    Abstract: Data augmentation is a key element for training accurate models by reducing overfitting and improving generalization. For image classification, the most popular data augmentation techniques range from simple photometric and geometrical transformations, to more complex methods that use visual saliency to craft new training examples. As augmentation methods get more complex, their ability to increas… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: Accepted at British Machine Vision Conference (BMVC) 2022. Code: https://github.com/SamsungLabs/SAGE

  49. Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs

    Authors: Phillip Howard, Arden Ma, Vasudev Lal, Ana Paula Simoes, Daniel Korat, Oren Pereg, Moshe Wasserblat, Gadi Singer

    Abstract: The extraction of aspect terms is a critical step in fine-grained sentiment analysis of text. Existing approaches for this task have yielded impressive results when the training and testing data are from the same domain. However, these methods show a drastic decrease in performance when applied to cross-domain settings where the domain of the testing data differs from that of the training data. To… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    ACM Class: I.2.7

    Journal ref: Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM 2022). Association for Computing Machinery, New York, NY, USA, 780-790

  50. arXiv:2208.11021  [pdf, other

    cs.CV

    Adversarial Feature Augmentation for Cross-domain Few-shot Classification

    Authors: Yanxu Hu, Andy J. Ma

    Abstract: Existing methods based on meta-learning predict novel-class labels for (target domain) testing tasks via meta knowledge learned from (source domain) training tasks of base classes. However, most existing works may fail to generalize to novel classes due to the probably large domain discrepancy across domains. To address this issue, we propose a novel adversarial feature augmentation (AFA) method t… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.