Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 232 results for author: Le, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.15680  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning

    Authors: Zhecan Wang, Garrett Bingham, Adams Yu, Quoc Le, Thang Luong, Golnaz Ghiasi

    Abstract: Hallucination has been a major problem for large language models and remains a critical challenge when it comes to multimodality in which vision-language models (VLMs) have to deal with not just textual but also visual inputs. Despite rapid progress in VLMs, resources for evaluating and addressing multimodal hallucination are limited and mostly focused on evaluation. This work introduces HaloQuest… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted as a main conference paper at ECCV 2024 (https://github.com/google/haloquest)

  2. arXiv:2407.14974  [pdf, other

    cs.LG cs.AI

    Out of spuriousity: Improving robustness to spurious correlations without group annotations

    Authors: Phuong Quynh Le, Jörg Schlötterer, Christin Seifert

    Abstract: Machine learning models are known to learn spurious correlations, i.e., features having strong relations with class labels but no causal relation. Relying on those correlations leads to poor performance in the data groups without these correlations and poor generalization ability. To improve the robustness of machine learning models to spurious correlations, we propose an approach to extract a sub… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  3. arXiv:2407.13803  [pdf, other

    cs.CR cs.AI cs.CL

    Less is More: Sparse Watermarking in LLMs with Enhanced Text Quality

    Authors: Duy C. Hoang, Hung T. Q. Le, Rui Chu, Ping Li, Weijie Zhao, Yingjie Lao, Khoa D. Doan

    Abstract: With the widespread adoption of Large Language Models (LLMs), concerns about potential misuse have emerged. To this end, watermarking has been adapted to LLM, enabling a simple and effective way to detect and monitor generated text. However, while the existing methods can differentiate between watermarked and unwatermarked text with high accuracy, they often face a trade-off between the quality of… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  4. arXiv:2407.12372  [pdf, other

    cs.CC math.OC

    Geometric and computational hardness of bilevel programming

    Authors: Jérôme Bolte, Quoc-Tung Le, Edouard Pauwels, Samuel Vaiter

    Abstract: We first show a simple but striking result in bilevel optimization: unconstrained $C^\infty$ smooth bilevel programming is as hard as general extended-real-valued lower semicontinuous minimization. We then proceed to a worst-case analysis of box-constrained bilevel polynomial optimization. We show in particular that any extended-real-valued semi-algebraic function, possibly non-continuous, can be… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  5. arXiv:2407.07296  [pdf

    physics.med-ph cs.AI cs.CV

    Large Language Model-Augmented Auto-Delineation of Treatment Target Volume in Radiation Therapy

    Authors: Praveenbalaji Rajendran, Yong Yang, Thomas R. Niedermayr, Michael Gensheimer, Beth Beadle, Quynh-Thu Le, Lei Xing, Xianjin Dai

    Abstract: Radiation therapy (RT) is one of the most effective treatments for cancer, and its success relies on the accurate delineation of targets. However, target delineation is a comprehensive medical decision that currently relies purely on manual processes by human experts. Manual delineation is time-consuming, laborious, and subject to interobserver variations. Although the advancements in artificial i… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  6. arXiv:2407.01987  [pdf, other

    cs.CV

    AHMsys: An Automated HVAC Modeling System for BIM Project

    Authors: Long Hoang Dang, Duy-Hung Nguyen, Thai Quang Le, Thinh Truong Nguyen, Clark Mei, Vu Hoang

    Abstract: This paper presents a novel system, named AHMsys, designed to automate the process of generating 3D Heating, Ventilation, and Air Conditioning (HVAC) models from 2D Computer-Aided Design (CAD) drawings, a key component of Building Information Modeling (BIM). By automatically preprocessing and extracting essential HVAC object information then creating detailed 3D models, our proposed AHMsys signifi… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  7. arXiv:2406.18602  [pdf

    stat.AP cs.LG stat.CO

    Multi-level Phenotypic Models of Cardiovascular Disease and Obstructive Sleep Apnea Comorbidities: A Longitudinal Wisconsin Sleep Cohort Study

    Authors: Duy Nguyen, Ca Hoang, Phat K. Huynh, Tien Truong, Dang Nguyen, Abhay Sharma, Trung Q. Le

    Abstract: Cardiovascular diseases (CVDs) are notably prevalent among patients with obstructive sleep apnea (OSA), posing unique challenges in predicting CVD progression due to the intricate interactions of comorbidities. Traditional models typically lack the necessary dynamic and longitudinal scope to accurately forecast CVD trajectories in OSA patients. This study introduces a novel multi-level phenotypic… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 30 pages, 5 figure, 5 tables

  8. arXiv:2406.15609  [pdf, other

    physics.med-ph cs.AI

    Automated radiotherapy treatment planning guided by GPT-4Vision

    Authors: Sheng Liu, Oscar Pastor-Serrano, Yizheng Chen, Matthew Gopaulchan, Weixing Liang, Mark Buyyounouski, Erqi Pollom, Quynh-Thu Le, Michael Gensheimer, Peng Dong, Yong Yang, James Zou, Lei Xing

    Abstract: Radiotherapy treatment planning is a time-consuming and potentially subjective process that requires the iterative adjustment of model parameters to balance multiple conflicting objectives. Recent advancements in large foundation models offer promising avenues for addressing the challenges in planning and clinical decision-making. This study introduces GPT-RadPlan, a fully automated treatment plan… ▽ More

    Submitted 1 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures

  9. arXiv:2406.04520  [pdf, other

    cs.CL cs.AI

    NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

    Authors: Huaixiu Steven Zheng, Swaroop Mishra, Hugh Zhang, Xinyun Chen, Minmin Chen, Azade Nova, Le Hou, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou

    Abstract: We introduce NATURAL PLAN, a realistic planning benchmark in natural language containing 3 key tasks: Trip Planning, Meeting Planning, and Calendar Scheduling. We focus our evaluation on the planning capabilities of LLMs with full information on the task, by providing outputs from tools such as Google Flights, Google Maps, and Google Calendar as contexts to the models. This eliminates the need for… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  10. arXiv:2405.15665  [pdf

    cs.SE

    Examining Ownership Models in Software Teams: A Systematic Literature Review and a Replication Study

    Authors: Umme Ayman Koana, Quang Hy Le, Shadikur Rahman, Chris Carlson, Francis Chew, Maleknaz Nayebi

    Abstract: Effective ownership of software artifacts, particularly code, is crucial for accountability, knowledge sharing, and code quality enhancement. Researchers have proposed models linking ownership of software artifacts with developer performance and code quality. Our study aims to systematically examine various ownership models and provide a structured literature overview. Conducting a systematic lite… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Pre-print an accepted paper for the ESE journal

  11. arXiv:2405.15013  [pdf, other

    cs.LG

    Make Inference Faster: Efficient GPU Memory Management for Butterfly Sparse Matrix Multiplication

    Authors: Antoine Gonon, Léon Zheng, Pascal Carrivain, Quoc-Tung Le

    Abstract: This paper is the first to assess the state of existing sparse matrix multiplication algorithms on GPU for the butterfly structure, a promising form of sparsity. This is achieved through a comprehensive benchmark that can be easily modified to add a new implementation. The goal is to provide a simple tool for users to select the optimal implementation based on their settings. Using this benchmark,… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  12. arXiv:2405.00392  [pdf, other

    cs.CR cs.AI

    Certified Adversarial Robustness of Machine Learning-based Malware Detectors via (De)Randomized Smoothing

    Authors: Daniel Gibert, Luca Demetrio, Giulio Zizzo, Quan Le, Jordi Planes, Battista Biggio

    Abstract: Deep learning-based malware detection systems are vulnerable to adversarial EXEmples - carefully-crafted malicious programs that evade detection with minimal perturbation. As such, the community is dedicating effort to develop mechanisms to defend against adversarial EXEmples. However, current randomized smoothing-based defenses are still vulnerable to attacks that inject blocks of adversarial con… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  13. arXiv:2404.13844  [pdf, other

    cs.LG cs.AI

    ColA: Collaborative Adaptation with Gradient Learning

    Authors: Enmao Diao, Qi Le, Suya Wu, Xinran Wang, Ali Anwar, Jie Ding, Vahid Tarokh

    Abstract: A primary function of back-propagation is to compute both the gradient of hidden representations and parameters for optimization with gradient descent. Training large models requires high computational costs due to their vast parameter sizes. While Parameter-Efficient Fine-Tuning (PEFT) methods aim to train smaller auxiliary models to save computational space, they still present computational over… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  14. arXiv:2404.11792  [pdf, other

    cs.AI

    Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study

    Authors: Zooey Nguyen, Anthony Annunziata, Vinh Luong, Sang Dinh, Quynh Le, Anh Hai Ha, Chanh Le, Hong An Phan, Shruti Raghavan, Christopher Nguyen

    Abstract: This paper investigates the impact of domain-specific model fine-tuning and of reasoning mechanisms on the performance of question-answering (Q&A) systems powered by large language models (LLMs) and Retrieval-Augmented Generation (RAG). Using the FinanceBench SEC financial filings dataset, we observe that, for RAG, combining a fine-tuned embedding model with a fine-tuned LLM achieves better accura… ▽ More

    Submitted 19 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Fixed typo of OODA's score on harder-question set in Table 2

  15. arXiv:2404.09259  [pdf, other

    cs.CV cs.AI

    FedCCL: Federated Dual-Clustered Feature Contrast Under Domain Heterogeneity

    Authors: Yu Qiao, Huy Q. Le, Mengchun Zhang, Apurba Adhikary, Chaoning Zhang, Choong Seon Hong

    Abstract: Federated learning (FL) facilitates a privacy-preserving neural network training paradigm through collaboration between edge clients and a central server. One significant challenge is that the distributed data is not independently and identically distributed (non-IID), typically including both intra-domain and inter-domain heterogeneity. However, recent research is limited to simply using averaged… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  16. arXiv:2403.18802  [pdf, other

    cs.CL cs.AI cs.LG

    Long-form factuality in large language models

    Authors: Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le

    Abstract: Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factua… ▽ More

    Submitted 3 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  17. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  18. arXiv:2403.01417  [pdf, other

    cs.LG cs.DC

    Asyn2F: An Asynchronous Federated Learning Framework with Bidirectional Model Aggregation

    Authors: Tien-Dung Cao, Nguyen T. Vuong, Thai Q. Le, Hoang V. N. Dao, Tram Truong-Huu

    Abstract: In federated learning, the models can be trained synchronously or asynchronously. Many research works have focused on developing an aggregation method for the server to aggregate multiple local models into the global model with improved performance. They ignore the heterogeneity of the training workers, which causes the delay in the training of the local models, leading to the obsolete information… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  19. A Robust Defense against Adversarial Attacks on Deep Learning-based Malware Detectors via (De)Randomized Smoothing

    Authors: Daniel Gibert, Giulio Zizzo, Quan Le, Jordi Planes

    Abstract: Deep learning-based malware detectors have been shown to be susceptible to adversarial malware examples, i.e. malware examples that have been deliberately manipulated in order to avoid detection. In light of the vulnerability of deep learning detectors to subtle input file modifications, we propose a practical defense against adversarial malware examples inspired by (de)randomized smoothing. In th… ▽ More

    Submitted 26 February, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: text overlap with arXiv:2308.08906

  20. arXiv:2402.03620  [pdf, other

    cs.AI cs.CL

    Self-Discover: Large Language Models Self-Compose Reasoning Structures

    Authors: Pei Zhou, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, Huaixiu Steven Zheng

    Abstract: We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasonin… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 17 pages, 11 figures, 5 tables

  21. arXiv:2401.13898  [pdf, other

    cs.LG

    Cross-Modal Prototype based Multimodal Federated Learning under Severely Missing Modality

    Authors: Huy Q. Le, Chu Myaet Thwal, Yu Qiao, Ye Lin Tun, Minh N. H. Nguyen, Choong Seon Hong

    Abstract: Multimodal federated learning (MFL) has emerged as a decentralized machine learning paradigm, allowing multiple clients with different modalities to collaborate on training a machine learning model across diverse data sources without sharing their private data. However, challenges, such as data heterogeneity and severely missing modalities, pose crucial hindrances to the robustness of MFL, signifi… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 12 pages, 8 figures, 5 tables

  22. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  23. arXiv:2312.08472  [pdf, other

    cs.NE cs.LG math.NA

    AutoNumerics-Zero: Automated Discovery of State-of-the-Art Mathematical Functions

    Authors: Esteban Real, Yao Chen, Mirko Rossini, Connal de Souza, Manav Garg, Akhil Verghese, Moritz Firsching, Quoc V. Le, Ekin Dogus Cubuk, David H. Park

    Abstract: Computers calculate transcendental functions by approximating them through the composition of a few limited-precision instructions. For example, an exponential can be calculated with a Taylor series. These approximation methods were developed over the centuries by mathematicians, who emphasized the attainability of arbitrary precision. Computers, however, operate on few limited precision types, su… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    ACM Class: I.2.2; I.2.6; G.1.2

  24. arXiv:2312.00763  [pdf, other

    cs.HC cs.AI cs.CL cs.LG

    Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses

    Authors: Xiao Ma, Swaroop Mishra, Ariel Liu, Sophie Su, Jilin Chen, Chinmay Kulkarni, Heng-Tze Cheng, Quoc Le, Ed Chi

    Abstract: Large language model (LLM) powered chatbots are primarily text-based today, and impose a large interactional cognitive load, especially for exploratory or sensemaking tasks such as planning a trip or learning about a new city. Because the interaction is textual, users have little scaffolding in the way of structure, informational "scent", or ability to specify high-level preferences or goals. We i… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 19 pages, 11 figures

  25. arXiv:2312.00398  [pdf, other

    cs.CV

    Learning to Estimate Critical Gait Parameters from Single-View RGB Videos with Transformer-Based Attention Network

    Authors: Quoc Hung T. Le, Hieu H. Pham

    Abstract: Musculoskeletal diseases and cognitive impairments in patients lead to difficulties in movement as well as negative effects on their psychological health. Clinical gait analysis, a vital tool for early diagnosis and treatment, traditionally relies on expensive optical motion capture systems. Recent advances in computer vision and deep learning have opened the door to more accessible and cost-effec… ▽ More

    Submitted 1 March, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: Accepted at ISBI 2024 (21st IEEE International Symposium on Biomedical Imaging)

  26. arXiv:2311.00737  [pdf

    cs.LG physics.ins-det physics.med-ph

    Real-Time Magnetic Tracking and Diagnosis of COVID-19 via Machine Learning

    Authors: Dang Nguyen, Phat K. Huynh, Vinh Duc An Bui, Kee Young Hwang, Nityanand Jain, Chau Nguyen, Le Huu Nhat Minh, Le Van Truong, Xuan Thanh Nguyen, Dinh Hoang Nguyen, Le Tien Dung, Trung Q. Le, Manh-Huong Phan

    Abstract: The COVID-19 pandemic underscored the importance of reliable, noninvasive diagnostic tools for robust public health interventions. In this work, we fused magnetic respiratory sensing technology (MRST) with machine learning (ML) to create a diagnostic platform for real-time tracking and diagnosis of COVID-19 and other respiratory diseases. The MRST precisely captures breathing patterns through thre… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  27. arXiv:2310.13236  [pdf, other

    cs.LG

    An Efficient Federated Learning Framework for Training Semantic Communication System

    Authors: Loc X. Nguyen, Huy Q. Le, Ye Lin Tun, Pyae Sone Aung, Yan Kyaw Tun, Zhu Han, Choong Seon Hong

    Abstract: Semantic communication has emerged as a pillar for the next generation of communication systems due to its capabilities in alleviating data redundancy. Most semantic communication systems are built upon advanced deep learning models whose training performance heavily relies on data availability. Existing studies often make unrealistic assumptions of a readily accessible data source, where in pract… ▽ More

    Submitted 9 November, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: 5 pages, 3 figures

  28. arXiv:2310.06117  [pdf, other

    cs.LG cs.AI cs.CL

    Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models

    Authors: Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen, Heng-Tze Cheng, Ed H. Chi, Quoc V Le, Denny Zhou

    Abstract: We present Step-Back Prompting, a simple prompting technique that enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. Using the concepts and principles to guide reasoning, LLMs significantly improve their abilities in following a correct reasoning path towards the solution. We conduct experiments of Step-Back Prompting with… ▽ More

    Submitted 12 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  29. arXiv:2310.03214  [pdf, other

    cs.CL

    FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

    Authors: Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, Thang Luong

    Abstract: Most large language models (LLMs) are trained once and never updated; thus, they lack the ability to dynamically adapt to our ever-changing world. In this work, we perform a detailed study of the factuality of LLM-generated text in the context of answering questions that test current world knowledge. Specifically, we introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of q… ▽ More

    Submitted 22 November, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Preprint, 26 pages, 10 figures, 5 tables; Added FreshEval

  30. arXiv:2309.12323  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Evaluating the diversity and utility of materials proposed by generative models

    Authors: Alexander New, Michael Pekala, Elizabeth A. Pogue, Nam Q. Le, Janna Domenico, Christine D. Piatko, Christopher D. Stiles

    Abstract: Generative machine learning models can use data generated by scientific modeling to create large quantities of novel material structures. Here, we assess how one state-of-the-art generative model, the physics-guided crystal generation model (PGCGM), can be used as part of the inverse design process. We show that the default PGCGM's input space is not smooth with respect to parameter variation, mak… ▽ More

    Submitted 9 August, 2023; originally announced September 2023.

    Comments: 12 pages, 9 figures. Published at SynS & ML @ ICML2023: https://openreview.net/forum?id=2ZYbmYTKoR

  31. arXiv:2309.03506  [pdf, other

    cs.CV cs.AI

    Towards Robust Natural-Looking Mammography Lesion Synthesis on Ipsilateral Dual-Views Breast Cancer Analysis

    Authors: Thanh-Huy Nguyen, Quang Hien Kha, Thai Ngoc Toan Truong, Ba Thinh Lam, Ba Hung Ngo, Quang Vinh Dinh, Nguyen Quoc Khanh Le

    Abstract: In recent years, many mammographic image analysis methods have been introduced for improving cancer classification tasks. Two major issues of mammogram classification tasks are leveraging multi-view mammographic information and class-imbalance handling. In the first problem, many multi-view methods have been released for concatenating features of two or more views for the training and inference st… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  32. arXiv:2309.03409  [pdf, other

    cs.LG cs.AI cs.CL

    Large Language Models as Optimizers

    Authors: Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou, Xinyun Chen

    Abstract: Optimization is ubiquitous. While derivative-based algorithms have been powerful tools for various problems, the absence of gradient imposes challenges on many real-world applications. In this work, we propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language. In eac… ▽ More

    Submitted 15 April, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: ICLR 2024; 42 pages, 26 figures, 15 tables. Code at https://github.com/google-deepmind/opro

  33. arXiv:2308.12477  [pdf, other

    cs.CL cs.CV econ.GN

    American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers

    Authors: Melissa Dell, Jacob Carlson, Tom Bryan, Emily Silcock, Abhishek Arora, Zejiang Shen, Luca D'Amico-Wong, Quan Le, Pablo Querubin, Leander Heldring

    Abstract: Existing full text datasets of U.S. public domain newspapers do not recognize the often complex layouts of newspaper scans, and as a result the digitized content scrambles texts from articles, headlines, captions, advertisements, and other layout regions. OCR quality can also be low. This study develops a novel, deep learning pipeline for extracting full article texts from newspaper images and app… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  34. Towards a Practical Defense against Adversarial Attacks on Deep Learning-based Malware Detectors via Randomized Smoothing

    Authors: Daniel Gibert, Giulio Zizzo, Quan Le

    Abstract: Malware detectors based on deep learning (DL) have been shown to be susceptible to malware examples that have been deliberately manipulated in order to evade detection, a.k.a. adversarial malware examples. More specifically, it has been show that deep learning detectors are vulnerable to small changes on the input file. Given this vulnerability of deep learning detectors, we propose a practical de… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  35. arXiv:2308.07187  [pdf, ps, other

    cs.IT cs.CC math.AC

    On the Asymptotic Nonnegative Rank of Matrices and its Applications in Information Theory

    Authors: Yeow Meng Chee, Quoc Tung Le, Hoang Ta

    Abstract: In this paper, we study the asymptotic nonnegative rank of matrices, which characterizes the asymptotic growth of the nonnegative rank of fixed nonnegative matrices under the Kronecker product. This quantity is important since it governs several notions in information theory such as the so-called exact Rényi common information and the amortized communication complexity. By using the theory of asym… ▽ More

    Submitted 29 January, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

  36. arXiv:2308.03958  [pdf, other

    cs.CL

    Simple synthetic data reduces sycophancy in large language models

    Authors: Jerry Wei, Da Huang, Yifeng Lu, Denny Zhou, Quoc V. Le

    Abstract: Sycophancy is an undesirable behavior where models tailor their responses to follow a human user's view even when that view is not objectively correct (e.g., adapting liberal views once a user reveals that they are liberal). In this paper, we study the prevalence of sycophancy in language models and propose a simple synthetic-data intervention to reduce this behavior. First, on a set of three sy… ▽ More

    Submitted 14 February, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

  37. arXiv:2308.03290  [pdf, other

    cs.CV cs.LG

    FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search

    Authors: Jordan Dotzel, Gang Wu, Andrew Li, Muhammad Umar, Yun Ni, Mohamed S. Abdelfattah, Zhiru Zhang, Liqun Cheng, Martin G. Dixon, Norman P. Jouppi, Quoc V. Le, Sheng Li

    Abstract: Quantization has become a mainstream compression technique for reducing model size, computational requirements, and energy consumption for modern deep neural networks (DNNs). With improved numerical support in recent hardware, including multiple variants of integer and floating point, mixed-precision quantization has become necessary to achieve high-quality results with low model cost. Prior mixed… ▽ More

    Submitted 1 May, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: Accepted to AutoML 2024

  38. arXiv:2308.00473  [pdf, other

    cs.LG cs.CV

    Is Last Layer Re-Training Truly Sufficient for Robustness to Spurious Correlations?

    Authors: Phuong Quynh Le, Jörg Schlötterer, Christin Seifert

    Abstract: Models trained with empirical risk minimization (ERM) are known to learn to rely on spurious features, i.e., their prediction is based on undesired auxiliary features which are strongly correlated with class labels but lack causal reasoning. This behavior particularly degrades accuracy in groups of samples of the correlated class that are missing the spurious feature or samples of the opposite cla… ▽ More

    Submitted 9 January, 2024; v1 submitted 1 August, 2023; originally announced August 2023.

    Comments: Accepted at IJCAI Workshop on XAI 2023

  39. arXiv:2307.13214  [pdf, other

    cs.LG cs.AI

    FedMEKT: Distillation-based Embedding Knowledge Transfer for Multimodal Federated Learning

    Authors: Huy Q. Le, Minh N. H. Nguyen, Chu Myaet Thwal, Yu Qiao, Chaoning Zhang, Choong Seon Hong

    Abstract: Federated learning (FL) enables a decentralized machine learning paradigm for multiple clients to collaboratively train a generalized global model without sharing their private data. Most existing works simply propose typical FL systems for single-modal data, thus limiting its potential on exploiting valuable multimodal data for future personalized applications. Furthermore, the majority of FL app… ▽ More

    Submitted 6 November, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

  40. arXiv:2307.10575  [pdf, other

    cs.LG cs.AI cs.CV

    Boosting Federated Learning Convergence with Prototype Regularization

    Authors: Yu Qiao, Huy Q. Le, Choong Seon Hong

    Abstract: As a distributed machine learning technique, federated learning (FL) requires clients to collaboratively train a shared model with an edge server without leaking their local data. However, the heterogeneous data distribution among clients often leads to a decrease in model performance. To tackle this issue, this paper introduces a prototype-based regularization strategy to address the heterogeneit… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  41. Query-Free Evasion Attacks Against Machine Learning-Based Malware Detectors with Generative Adversarial Networks

    Authors: Daniel Gibert, Jordi Planes, Quan Le, Giulio Zizzo

    Abstract: Malware detectors based on machine learning (ML) have been shown to be susceptible to adversarial malware examples. However, current methods to generate adversarial malware examples still have their limits. They either rely on detailed model information (gradient-based attacks), or on detailed outputs of the model - such as class probabilities (score-based attacks), neither of which are available… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Journal ref: 2023 IEEE European Symposium on Security and Privacy Workshops

  42. arXiv:2306.02666  [pdf, other

    cs.NE math.AG math.FA math.OC

    Does a sparse ReLU network training problem always admit an optimum?

    Authors: Quoc-Tung Le, Elisa Riccietti, Rémi Gribonval

    Abstract: Given a training set, a loss function, and a neural network architecture, it is often taken for granted that optimal network parameters exist, and a common practice is to apply available optimization algorithms to search for them. In this work, we show that the existence of an optimal solution is not always guaranteed, especially in the context of {\em sparse} ReLU neural networks. In particular,… ▽ More

    Submitted 5 December, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 - Thirty-seventh Conference on Neural Information Processing Systems, Dec 2023, New Orleans (Lousiane), United States

  43. arXiv:2306.00008  [pdf, other

    cs.LG cs.CL

    Brainformers: Trading Simplicity for Efficiency

    Authors: Yanqi Zhou, Nan Du, Yanping Huang, Daiyi Peng, Chang Lan, Da Huang, Siamak Shakeri, David So, Andrew Dai, Yifeng Lu, Zhifeng Chen, Quoc Le, Claire Cui, James Laudon, Jeff Dean

    Abstract: Transformers are central to recent successes in natural language processing and computer vision. Transformers have a mostly uniform backbone where layers alternate between feed-forward and self-attention in order to build a deep network. Here we investigate this design choice and find that more complex blocks that have different permutations of layer primitives can be more efficient. Using this in… ▽ More

    Submitted 25 April, 2024; v1 submitted 29 May, 2023; originally announced June 2023.

  44. arXiv:2305.17052  [pdf, other

    cs.LG cs.AI cs.CY cs.GT cs.MA

    A Framework for Incentivized Collaborative Learning

    Authors: Xinran Wang, Qi Le, Ahmad Faraz Khan, Jie Ding, Ali Anwar

    Abstract: Collaborations among various entities, such as companies, research labs, AI agents, and edge devices, have become increasingly crucial for achieving machine learning tasks that cannot be accomplished by a single entity alone. This is likely due to factors such as security constraints, privacy concerns, and limitations in computation resources. As a result, collaborative learning (CL) research has… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  45. arXiv:2305.10429  [pdf, other

    cs.CL cs.LG

    DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

    Authors: Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

    Abstract: The mixture proportions of pretraining data domains (e.g., Wikipedia, books, web text) greatly affect language model (LM) performance. In this paper, we propose Domain Reweighting with Minimax Optimization (DoReMi), which first trains a small proxy model using group distributionally robust optimization (Group DRO) over domains to produce domain weights (mixture proportions) without knowledge of do… ▽ More

    Submitted 20 November, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  46. arXiv:2305.08298  [pdf, other

    cs.CL

    Symbol tuning improves in-context learning in language models

    Authors: Jerry Wei, Le Hou, Andrew Lampinen, Xiangning Chen, Da Huang, Yi Tay, Xinyun Chen, Yifeng Lu, Denny Zhou, Tengyu Ma, Quoc V. Le

    Abstract: We present symbol tuning - finetuning language models on in-context input-label pairs where natural language labels (e.g., "positive/negative sentiment") are replaced with arbitrary symbols (e.g., "foo/bar"). Symbol tuning leverages the intuition that when a model cannot use instructions or natural language labels to figure out a task, it must instead do so by learning the input-label mappings.… ▽ More

    Submitted 30 December, 2023; v1 submitted 14 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023

  47. arXiv:2304.10553   

    cs.LG cs.AI cs.CR

    Sparsity in neural networks can improve their privacy

    Authors: Antoine Gonon, Léon Zheng, Clément Lalanne, Quoc-Tung Le, Guillaume Lauga, Can Pouliquen

    Abstract: This article measures how sparsity can make neural networks more robust to membership inference attacks. The obtained empirical results show that sparsity improves the privacy of the network, while preserving comparable performances on the task at hand. This empirical study completes and extends existing literature.

    Submitted 11 June, 2024; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: arXiv admin note: duplicate of arXiv:2304.07234

  48. arXiv:2304.07514  [pdf, other

    cs.LG cs.AI

    PI-FL: Personalized and Incentivized Federated Learning

    Authors: Ahmad Faraz Khan, Xinran Wang, Qi Le, Azal Ahmad Khan, Haider Ali, Jie Ding, Ali Butt, Ali Anwar

    Abstract: Personalized FL has been widely used to cater to heterogeneity challenges with non-IID data. A primary obstacle is considering the personalization process from the client's perspective to preserve their autonomy. Allowing the clients to participate in personalized FL decisions becomes significant due to privacy and security concerns, where the clients may not be at liberty to share private informa… ▽ More

    Submitted 27 April, 2023; v1 submitted 15 April, 2023; originally announced April 2023.

  49. arXiv:2304.07234  [pdf, other

    cs.CR cs.LG

    Can sparsity improve the privacy of neural networks?

    Authors: Antoine Gonon, Léon Zheng, Clément Lalanne, Quoc-Tung Le, Guillaume Lauga, Can Pouliquen

    Abstract: Sparse neural networks are mainly motivated by ressource efficiency since they use fewer parameters than their dense counterparts but still reach comparable accuracies. This article empirically investigates whether sparsity could also improve the privacy of the data used to train the networks. The experiments show positive correlations between the sparsity of the model, its privacy, and its classi… ▽ More

    Submitted 23 May, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

  50. arXiv:2304.01950  [pdf, other

    cs.LG cs.AI cs.CV cs.DC

    MP-FedCL: Multiprototype Federated Contrastive Learning for Edge Intelligence

    Authors: Yu Qiao, Md. Shirajum Munir, Apurba Adhikary, Huy Q. Le, Avi Deb Raha, Chaoning Zhang, Choong Seon Hong

    Abstract: Federated learning-assisted edge intelligence enables privacy protection in modern intelligent services. However, not independent and identically distributed (non-IID) distribution among edge clients can impair the local model performance. The existing single prototype-based strategy represents a class by using the mean of the feature space. However, feature spaces are usually not clustered, and a… ▽ More

    Submitted 11 October, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

    Comments: Accepted by IEEE Internet of Things