Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 945 results for author: Chen, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02038  [pdf, other

    cs.CL cs.AI cs.DB

    BEAVER: An Enterprise Benchmark for Text-to-SQL

    Authors: Peter Baile Chen, Fabian Wenz, Yi Zhang, Moe Kayali, Nesime Tatbul, Michael Cafarella, Çağatay Demiralp, Michael Stonebraker

    Abstract: Existing text-to-SQL benchmarks have largely been constructed using publicly available tables from the web with human-generated tests containing question and SQL statement pairs. They typically show very good results and lead people to think that LLMs are effective at text-to-SQL tasks. In this paper, we apply off-the-shelf LLMs to a benchmark containing enterprise data warehouse data. In this env… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  2. arXiv:2409.01821  [pdf, other

    cs.CV cs.LG

    When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective

    Authors: Hsi-Ai Tsao, Lei Hsiung, Pin-Yu Chen, Tsung-Yi Ho

    Abstract: Adapting pre-trained models to new tasks can exhibit varying effectiveness across datasets. Visual prompting, a state-of-the-art parameter-efficient transfer learning method, can significantly improve the performance of out-of-distribution tasks. On the other hand, linear probing, a standard transfer learning method, can sometimes become the best approach. We propose a log-likelihood ratio (LLR) a… ▽ More

    Submitted 4 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

  3. Learning to Discover Forgery Cues for Face Forgery Detection

    Authors: Jiahe Tian, Peng Chen, Cai Yu, Xiaomeng Fu, Xi Wang, Jiao Dai, Jizhong Han

    Abstract: Locating manipulation maps, i.e., pixel-level annotation of forgery cues, is crucial for providing interpretable detection results in face forgery detection. Related learning objects have also been widely adopted as auxiliary tasks to improve the classification performance of detectors whereas they require comparisons between paired real and forged faces to obtain manipulation maps as supervision.… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: TIFS 2024

  4. arXiv:2409.00127  [pdf, other

    cs.LG cs.AI eess.SP stat.ML

    Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data Assimilation with Sparse Observation Data

    Authors: Phillip Si, Peng Chen

    Abstract: Accurate modeling and prediction of complex physical systems often rely on data assimilation techniques to correct errors inherent in model simulations. Traditional methods like the Ensemble Kalman Filter (EnKF) and its variants as well as the recently developed Ensemble Score Filters (EnSF) face significant challenges when dealing with high-dimensional and nonlinear Bayesian filtering problems wi… ▽ More

    Submitted 29 August, 2024; originally announced September 2024.

    Comments: 13 pages, 10 figures, 1 table

    MSC Class: 68U01 ACM Class: J.2; I.2.1

  5. arXiv:2408.16756  [pdf, other

    cs.CL

    How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models

    Authors: Jiyue Jiang, Liheng Chen, Pengan Chen, Sheng Wang, Qinghang Bao, Lingpeng Kong, Yu Li, Chuan Wu

    Abstract: The rapid evolution of large language models (LLMs) has transformed the competitive landscape in natural language processing (NLP), particularly for English and other data-rich languages. However, underrepresented languages like Cantonese, spoken by over 85 million people, face significant development gaps, which is particularly concerning given the economic significance of the Guangdong-Hong Kong… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  6. arXiv:2408.15366  [pdf, other

    cs.CL

    Pitfalls and Outlooks in Using COMET

    Authors: Vilém Zouhar, Pinzhen Chen, Tsz Kin Lam, Nikita Moghe, Barry Haddow

    Abstract: Since its introduction, the COMET metric has blazed a trail in the machine translation community, given its strong correlation with human judgements of translation quality. Its success stems from being a modified pre-trained multilingual model finetuned for quality assessment. However, it being a machine learning model also gives rise to a new set of pitfalls that may not be widely known. We inves… ▽ More

    Submitted 2 September, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

  7. arXiv:2408.15101  [pdf, other

    cs.CV cs.AI

    MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders

    Authors: Baijiong Lin, Weisen Jiang, Pengguang Chen, Shu Liu, Ying-Cong Chen

    Abstract: Multi-task dense scene understanding, which trains a model for multiple dense prediction tasks, has a wide range of application scenarios. Capturing long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba++, a novel architecture for multi-task scene understanding featuring with a Mamba-based decoder. It contains two… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2407.02228

  8. arXiv:2408.13750  [pdf, other

    cs.AI cs.MA

    Multi-Agent Target Assignment and Path Finding for Intelligent Warehouse: A Cooperative Multi-Agent Deep Reinforcement Learning Perspective

    Authors: Qi Liu, Jianqi Gao, Dongjie Zhu, Xizheng Pang, Pengbin Chen, Jingxiang Guo, Yanjie Li

    Abstract: Multi-agent target assignment and path planning (TAPF) are two key problems in intelligent warehouse. However, most literature only addresses one of these two problems separately. In this study, we propose a method to simultaneously solve target assignment and path planning from a perspective of cooperative multi-agent deep reinforcement learning (RL). To the best of our knowledge, this is the fir… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  9. arXiv:2408.13697  [pdf, other

    cs.CV

    Guided and Fused: Efficient Frozen CLIP-ViT with Feature Guidance and Multi-Stage Feature Fusion for Generalizable Deepfake Detection

    Authors: Yingjian Chen, Lei Zhang, Yakun Niu, Pei Chen, Lei Tan, Jing Zhou

    Abstract: The rise of generative models has sparked concerns about image authenticity online, highlighting the urgent need for an effective and general detector. Recent methods leveraging the frozen pre-trained CLIP-ViT model have made great progress in deepfake detection. However, these models often rely on visual-general features directly extracted by the frozen network, which contain excessive informatio… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  10. arXiv:2408.12780  [pdf, other

    cs.CL

    Quality or Quantity? On Data Scale and Diversity in Adapting Large Language Models for Low-Resource Translation

    Authors: Vivek Iyer, Bhavitvya Malik, Pavel Stepachev, Pinzhen Chen, Barry Haddow, Alexandra Birch

    Abstract: Despite the recent popularity of Large Language Models (LLMs) in Machine Translation (MT), their performance in low-resource translation still lags significantly behind Neural Machine Translation (NMT) models. In this paper, we explore what it would take to adapt LLMs for low-resource settings. In particular, we re-examine the role of two factors: a) the importance and application of parallel data… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 10 pages, 6 figures

  11. arXiv:2408.10099  [pdf, other

    cs.GR

    Neural Representation of Shape-Dependent Laplacian Eigenfunctions

    Authors: Yue Chang, Otman Benchekroun, Maurizio M. Chiaramonte, Peter Yichen Chen, Eitan Grinspun

    Abstract: The eigenfunctions of the Laplace operator are essential in mathematical physics, engineering, and geometry processing. Typically, these are computed by discretizing the domain and performing eigendecomposition, tying the results to a specific mesh. However, this method is unsuitable for continuously-parameterized shapes. We propose a novel representation for eigenfunctions in continuously-param… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  12. arXiv:2408.09065  [pdf, other

    cs.CV cs.AI cs.LG

    Linking Robustness and Generalization: A k* Distribution Analysis of Concept Clustering in Latent Space for Vision Models

    Authors: Shashank Kotyan, Pin-Yu Chen, Danilo Vasconcellos Vargas

    Abstract: Most evaluations of vision models use indirect methods to assess latent space quality. These methods often involve adding extra layers to project the latent space into a new one. This projection makes it difficult to analyze and compare the original latent space. This article uses the k* Distribution, a local neighborhood analysis method, to examine the learned latent space at the level of individ… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  13. arXiv:2408.09030  [pdf, other

    cs.CL cs.HC

    Studying the Effects of Collaboration in Interactive Theme Discovery Systems

    Authors: Alvin Po-Chun Chen, Dananjay Srinivas, Alexandra Barry, Maksim Seniw, Maria Leonor Pacheco

    Abstract: NLP-assisted solutions have gained considerable traction to support qualitative data analysis. However, there does not exist a unified evaluation framework that can account for the many different settings in which qualitative researchers may employ them. In this paper, we take a first step in this direction by proposing an evaluation framework to study the way in which different tools may result i… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  14. arXiv:2408.07303  [pdf

    cs.CV cs.CL cs.LG

    Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion

    Authors: Peiyuan Chen, Zecheng Zhang, Yiping Dong, Li Zhou, Han Wang

    Abstract: Visual Question Answering (VQA) is a challenging task that requires systems to provide accurate answers to questions based on image content. Current VQA models struggle with complex questions due to limitations in capturing and integrating multimodal information effectively. To address these challenges, we propose the Rank VQA model, which leverages a ranking-inspired hybrid training strategy to e… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Visual Question Answering, Rank VQA, Faster R-CNN, BERT, Multimodal Fusion, Ranking Learning, Hybrid Training Strategy

  15. arXiv:2408.05416  [pdf, other

    cs.CV cs.AI cs.MM

    High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model

    Authors: Weizhi Zhong, Junfan Lin, Peixin Chen, Liang Lin, Guanbin Li

    Abstract: Audio-driven talking face video generation has attracted increasing attention due to its huge industrial potential. Some previous methods focus on learning a direct mapping from audio to visual content. Despite progress, they often struggle with the ambiguity of the mapping process, leading to flawed results. An alternative strategy involves facial structural representations (e.g., facial landmark… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: submitted to IEEE Transactions on Image Processing(TIP)

  16. arXiv:2408.03361  [pdf, other

    eess.IV cs.CV

    GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

    Authors: Pengcheng Chen, Jin Ye, Guoan Wang, Yanjun Li, Zhongying Deng, Wei Li, Tianbin Li, Haodong Duan, Ziyan Huang, Yanzhou Su, Benyou Wang, Shaoting Zhang, Bin Fu, Jianfei Cai, Bohan Zhuang, Eric J Seibel, Junjun He, Yu Qiao

    Abstract: Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial assistance for diagnosis and treatment. Before that, it is crucial to develop benchmarks to evaluate LVLMs' effectiveness in various medical applications. Curren… ▽ More

    Submitted 9 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  17. arXiv:2408.01365  [pdf, ps, other

    cs.CC cs.LG

    Data Debugging is NP-hard for Classifiers Trained with SGD

    Authors: Zizheng Guo, Pengyu Chen, Yanzhang Fu, Dongjing Miao

    Abstract: Data debugging is to find a subset of the training data such that the model obtained by retraining on the subset has a better accuracy. A bunch of heuristic approaches are proposed, however, none of them are guaranteed to solve this problem effectively. This leaves an open issue whether there exists an efficient algorithm to find the subset such that the model obtained by retraining on it has a be… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  18. arXiv:2407.20730  [pdf, other

    cs.CV

    Autogenic Language Embedding for Coherent Point Tracking

    Authors: Zikai Song, Ying Tang, Run Luo, Lintao Ma, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang

    Abstract: Point tracking is a challenging task in computer vision, aiming to establish point-wise correspondence across long video sequences. Recent advancements have primarily focused on temporal modeling techniques to improve local feature similarity, often overlooking the valuable semantic consistency inherent in tracked points. In this paper, we introduce a novel approach leveraging language embeddings… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: accepted by ACM MM 2024

  19. arXiv:2407.20372  [pdf, other

    cs.CV

    A Model Generalization Study in Localizing Indoor Cows with COw LOcalization (COLO) dataset

    Authors: Mautushi Das, Gonzalo Ferreira, C. P. James Chen

    Abstract: Precision livestock farming (PLF) increasingly relies on advanced object localization techniques to monitor livestock health and optimize resource management. This study investigates the generalization capabilities of YOLOv8 and YOLOv9 models for cow detection in indoor free-stall barn settings, focusing on varying training data characteristics such as view angles and lighting, and model complexit… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 17 pages, 7 figures

    MSC Class: C.4; E.0

  20. arXiv:2407.20256  [pdf

    cs.DB cs.AI cs.LG

    Making LLMs Work for Enterprise Data Tasks

    Authors: Çağatay Demiralp, Fabian Wenz, Peter Baile Chen, Moe Kayali, Nesime Tatbul, Michael Stonebraker

    Abstract: Large language models (LLMs) know little about enterprise database tables in the private data ecosystem, which substantially differ from web text in structure and content. As LLMs' performance is tied to their training data, a crucial question is how useful they can be in improving enterprise database management and analysis tasks. To address this, we contribute experimental results on LLMs' perfo… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Poster at North East Database Day 2024

  21. arXiv:2407.20228  [pdf, other

    cs.CV

    FlexAttention for Efficient High-Resolution Vision-Language Models

    Authors: Junyan Li, Delin Chen, Tianle Cai, Peihao Chen, Yining Hong, Zhenfang Chen, Yikang Shen, Chuang Gan

    Abstract: Current high-resolution vision-language models encode images as high-resolution image tokens and exhaustively take all these tokens to compute attention, which significantly increases the computational cost. To address this problem, we propose FlexAttention, a flexible attention mechanism for efficient high-resolution vision-language models. Specifically, a high-resolution image is encoded both as… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  22. arXiv:2407.19512  [pdf, other

    cs.CV

    Large-scale cervical precancerous screening via AI-assisted cytology whole slide image analysis

    Authors: Honglin Li, Yusuan Sun, Chenglu Zhu, Yunlong Zhang, Shichuan Zhang, Zhongyi Shui, Pingyi Chen, Jingxiong Li, Sunyi Zheng, Can Cui, Lin Yang

    Abstract: Cervical Cancer continues to be the leading gynecological malignancy, posing a persistent threat to women's health on a global scale. Early screening via cytology Whole Slide Image (WSI) diagnosis is critical to prevent this Cancer progression and improve survival rate, but pathologist's single test suffers inevitable false negative due to the immense number of cells that need to be reviewed withi… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  23. arXiv:2407.16999  [pdf, other

    cs.LG cs.AI cs.HC

    SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing

    Authors: Changchang Yin, Pin-Yu Chen, Bingsheng Yao, Dakuo Wang, Jeffrey Caterino, Ping Zhang

    Abstract: Sepsis is the leading cause of in-hospital mortality in the USA. Early sepsis onset prediction and diagnosis could significantly improve the survival of sepsis patients. Existing predictive models are usually trained on high-quality data with few missing information, while missing values widely exist in real-world clinical scenarios (especially in the first hours of admissions to the hospital), wh… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: To be published in KDD 2024

    MSC Class: 68T07 (primary) 92C50 (secondary) ACM Class: H.2.8; I.2.1; J.3

  24. arXiv:2407.16682  [pdf, other

    cs.CV

    SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation

    Authors: Pengfei Chen, Lingxi Xie, Xinyue Huo, Xuehui Yu, Xiaopeng Zhang, Yingfei Sun, Zhenjun Han, Qi Tian

    Abstract: The Segment Anything model (SAM) has shown a generalized ability to group image pixels into patches, but applying it to semantic-aware segmentation still faces major challenges. This paper presents SAM-CP, a simple approach that establishes two types of composable prompts beyond SAM and composes them for versatile segmentation. Specifically, given a set of classes (in texts) and a set of SAM patch… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  25. arXiv:2407.15346  [pdf, other

    cs.CV cs.CL cs.MM

    Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models

    Authors: Wenbin An, Feng Tian, Jiahao Nie, Wenkai Shi, Haonan Lin, Yan Chen, QianYing Wang, Yaqiang Wu, Guang Dai, Ping Chen

    Abstract: Knowledge-based Visual Question Answering (KVQA) requires both image and world knowledge to answer questions. Current methods first retrieve knowledge from the image and external knowledge base with the original complex question, then generate answers with Large Language Models (LLMs). However, since the original question contains complex elements that require knowledge from different sources, acq… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Pre-print

  26. arXiv:2407.14936  [pdf, other

    cs.MM

    EidetiCom: A Cross-modal Brain-Computer Semantic Communication Paradigm for Decoding Visual Perception

    Authors: Linfeng Zheng, Peilin Chen, Shiqi Wang

    Abstract: Brain-computer interface (BCI) facilitates direct communication between the human brain and external systems by utilizing brain signals, eliminating the need for conventional communication methods such as speaking, writing, or typing. Nevertheless, the continuous generation of brain signals in BCI frameworks poses challenges for efficient storage and real-time transmission. While considering the h… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  27. arXiv:2407.14541  [pdf

    physics.soc-ph cs.CY cs.LG

    Mitigating biases in big mobility data: a case study of monitoring large-scale transit systems

    Authors: Feilong Wang, Xuegang Ban, Peng Chen, Chenxi Liu, Rong Zhao

    Abstract: Big mobility datasets (BMD) have shown many advantages in studying human mobility and evaluating the performance of transportation systems. However, the quality of BMD remains poorly understood. This study evaluates biases in BMD and develops mitigation methods. Using Google and Apple mobility data as examples, this study compares them with benchmark data from governmental agencies. Spatio-tempora… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 10 figures. Transportation Letters. August 2024

  28. arXiv:2407.12445  [pdf, other

    cs.LG cs.CY

    A Comprehensive Sustainable Framework for Machine Learning and Artificial Intelligence

    Authors: Roberto Pagliari, Peter Hill, Po-Yu Chen, Maciej Dabrowny, Tingsheng Tan, Francois Buet-Golfouse

    Abstract: In financial applications, regulations or best practices often lead to specific requirements in machine learning relating to four key pillars: fairness, privacy, interpretability and greenhouse gas emissions. These all sit in the broader context of sustainability in AI, an emerging practical AI topic. However, although these pillars have been individually addressed by past literature, none of thes… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 8 pages, 3 figures, 4 tables, ECAI 24'

    ACM Class: I.2.0

  29. arXiv:2407.11998  [pdf, other

    cs.HC

    Custom Cloth Creation and Virtual Try-on for Everyone

    Authors: Pei Chen, Heng Wang, Sainan Sun, Zhiyuan Chen, Zhenkun Liu, Shuhua Cao, Li Yang, Minghui Yang

    Abstract: This demo showcases a simple tool that utilizes AIGC technology, enabling both professional designers and regular users to easily customize clothing for their digital avatars. Customization options include changing clothing colors, textures, logos, and patterns. Compared with traditional 3D modeling processes, our approach significantly enhances efficiency and interactivity and reduces production… ▽ More

    Submitted 13 June, 2024; originally announced July 2024.

  30. arXiv:2407.10625  [pdf, other

    cs.CV

    WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models

    Authors: Zijian He, Peixin Chen, Guangrun Wang, Guanbin Li, Philip H. S. Torr, Liang Lin

    Abstract: Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos. Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions, limiting their effectiveness in video try-on applications. Moreover, video-based models require extensive, high-quality data and… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  31. arXiv:2407.09486  [pdf, other

    cs.DC cs.AI

    ENOVA: Autoscaling towards Cost-effective and Stable Serverless LLM Serving

    Authors: Tao Huang, Pengfei Chen, Kyoka Gong, Jocky Hawk, Zachary Bright, Wenxin Xie, Kecheng Huang, Zhi Ji

    Abstract: Since the increasing popularity of large language model (LLM) backend systems, it is common and necessary to deploy stable serverless serving of LLM on multi-GPU clusters with autoscaling. However, there exist challenges because the diversity and co-location of applications in multi-GPU clusters will lead to low service quality and GPU utilization. To address them, we build ENOVA, a deployment, mo… ▽ More

    Submitted 17 May, 2024; originally announced July 2024.

  32. arXiv:2407.08584  [pdf, other

    cs.DC

    Data-Locality-Aware Task Assignment and Scheduling for Distributed Job Executions

    Authors: Hailiang Zhao, Xueyan Tang, Peng Chen, Jianwei Yin, Shuiguang Deng

    Abstract: This paper investigates a data-locality-aware task assignment and scheduling problem aimed at minimizing job completion times for distributed job executions. Without prior knowledge of future job arrivals, we propose an optimal balanced task assignment algorithm (OBTA) that minimizes the completion time of each arriving job. We significantly reduce OBTA's computational overhead by narrowing the se… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  33. arXiv:2407.08440  [pdf, other

    cs.CL cs.AI

    Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models

    Authors: Wangtao Sun, Chenxiang Zhang, Xueyou Zhang, Ziyang Huang, Haotian Xu, Pei Chen, Shizhu He, Jun Zhao, Kang Liu

    Abstract: Although Large Language Models (LLMs) have demonstrated strong instruction-following ability, they are further supposed to be controlled and guided by rules in real-world scenarios to be safe, accurate, and intelligent. This demands the possession of inferential rule-following capability of LLMs. However, few works have made a clear evaluation of the inferential rule-following capability of LLMs.… ▽ More

    Submitted 18 August, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  34. arXiv:2407.06536  [pdf, other

    cs.NE

    A Two-stage Evolutionary Framework For Multi-objective Optimization

    Authors: Peng Chen, Jing Liang, Kangjia Qiao, Ponnuthurai Nagaratnam Suganthan, Xuanxuan Ban

    Abstract: In the field of evolutionary multi-objective optimization, the approximation of the Pareto front (PF) is achieved by utilizing a collection of representative candidate solutions that exhibit desirable convergence and diversity. Although several multi-objective evolutionary algorithms (MOEAs) have been designed, they still have difficulties in keeping balance between convergence and diversity of po… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted by the CEC conference of WCCI2024

  35. arXiv:2407.05603  [pdf, other

    cs.CV cs.AI

    WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering

    Authors: Pingyi Chen, Chenglu Zhu, Sunyi Zheng, Honglin Li, Lin Yang

    Abstract: Whole slide imaging is routinely adopted for carcinoma diagnosis and prognosis. Abundant experience is required for pathologists to achieve accurate and reliable diagnostic results of whole slide images (WSI). The huge size and heterogeneous features of WSIs make the workflow of pathological reading extremely time-consuming. In this paper, we propose a novel framework (WSI-VQA) to interpret WSIs b… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  36. arXiv:2407.04297  [pdf, other

    cs.CR

    HuntFUZZ: Enhancing Error Handling Testing through Clustering Based Fuzzing

    Authors: Jin Wei, Ping Chen, Jun Dai, Xiaoyan Sun, Zhihao Zhang, Chang Xu, Yi Wanga

    Abstract: Testing a program's capability to effectively handling errors is a significant challenge, given that program errors are relatively uncommon. To solve this, Software Fault Injection (SFI)-based fuzzing integrates SFI and traditional fuzzing, injecting and triggering errors for testing (error handling) code. However, we observe that current SFI-based fuzzing approaches have overlooked the correlatio… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  37. arXiv:2407.04294  [pdf, other

    cs.CR

    SQLaser: Detecting DBMS Logic Bugs with Clause-Guided Fuzzing

    Authors: Jin Wei, Ping Chen, Kangjie Lu, Jun Dai, Xiaoyan Sun

    Abstract: Database Management Systems (DBMSs) are vital components in modern data-driven systems. Their complexity often leads to logic bugs, which are implementation errors within the DBMSs that can lead to incorrect query results, data exposure, unauthorized access, etc., without necessarily causing visible system failures. Existing detection employs two strategies: rule-based bug detection and coverage-g… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  38. arXiv:2407.03925  [pdf, other

    cs.LG

    Reduced-Order Neural Operators: Learning Lagrangian Dynamics on Highly Sparse Graphs

    Authors: Hrishikesh Viswanath, Yue Chang, Julius Berner, Peter Yichen Chen, Aniket Bera

    Abstract: We present a neural operator architecture to simulate Lagrangian dynamics, such as fluid flow, granular flows, and elastoplasticity. Traditional numerical methods, such as the finite element method (FEM), suffer from long run times and large memory consumption. On the other hand, approaches based on graph neural networks are faster but still suffer from long computation times on dense graphs, whic… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  39. arXiv:2407.03672  [pdf, other

    cs.LG cs.AI

    A Survey of Data Synthesis Approaches

    Authors: Hsin-Yu Chang, Pei-Yu Chen, Tun-Hsiang Chou, Chang-Sheng Kao, Hsuan-Yun Yu, Yen-Ting Lin, Yun-Nung Chen

    Abstract: This paper provides a detailed survey of synthetic data techniques. We first discuss the expected goals of using synthetic data in data augmentation, which can be divided into four parts: 1) Improving Diversity, 2) Data Balancing, 3) Addressing Domain Shift, and 4) Resolving Edge Cases. Synthesizing data are closely related to the prevailing machine learning techniques at the time, therefore, we s… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  40. arXiv:2407.02228  [pdf, other

    cs.CV cs.AI

    MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders

    Authors: Baijiong Lin, Weisen Jiang, Pengguang Chen, Yu Zhang, Shu Liu, Ying-Cong Chen

    Abstract: Multi-task dense scene understanding, which learns a model for multiple dense prediction tasks, has a wide range of application scenarios. Modeling long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba, a novel Mamba-based architecture for multi-task scene understanding. It contains two types of core blocks: self-t… ▽ More

    Submitted 14 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  41. arXiv:2407.00579  [pdf, ps, other

    cs.IT eess.SP

    Active-RIS-Aided Covert Communications in NOMA-Inspired ISAC Wireless Systems

    Authors: Miaomiao Zhu, Pengxu Chen, Liang Yang, Alexandros-Apostolos A. Boulogeorgos, Theodoros A. Tsiftsis, Hongwu Liu

    Abstract: Non-orthogonal multiple access (NOMA)-inspired integrated sensing and communication (ISAC) facilitates spectrum sharing for radar sensing and NOMA communications, whereas facing privacy and security challenges due to open wireless propagation. In this paper, active reconfigurable intelligent surface (RIS) is employed to aid covert communications in NOMA-inspired ISAC wireless system with the aim o… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  42. arXiv:2407.00125  [pdf, other

    cs.SE cs.AI cs.DC

    A Survey on Failure Analysis and Fault Injection in AI Systems

    Authors: Guangba Yu, Gou Tan, Haojia Huang, Zhenyu Zhang, Pengfei Chen, Roberto Natella, Zibin Zheng

    Abstract: The rapid advancement of Artificial Intelligence (AI) has led to its integration into various areas, especially with Large Language Models (LLMs) significantly enhancing capabilities in Artificial Intelligence Generated Content (AIGC). However, the complexity of AI systems has also exposed their vulnerabilities, necessitating robust methods for failure analysis (FA) and fault injection (FI) to ens… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  43. arXiv:2406.19703  [pdf, other

    cs.CV

    Vision Transformer with Key-select Routing Attention for Single Image Dehazing

    Authors: Lihan Tong, Weijia Li, Qingxia Yang, Liyuan Chen, Peng Chen

    Abstract: We present Ksformer, utilizing Multi-scale Key-select Routing Attention (MKRA) for intelligent selection of key areas through multi-channel, multi-scale windows with a top-k operator, and Lightweight Frequency Processing Module (LFPM) to enhance high-frequency features, outperforming other dehazing methods in tests.

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 5 pages,4 figures,IEICE Trans. Information and Systems

    Report number: Vol.E107-D,No.11,pp.-,Nov. 2024 MSC Class: 68U10(Primary) ACM Class: I.4

  44. arXiv:2406.19622  [pdf, other

    cs.LG cs.AI

    Data-Driven Lipschitz Continuity: A Cost-Effective Approach to Improve Adversarial Robustness

    Authors: Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung, Che-Rung Lee

    Abstract: The security and robustness of deep neural networks (DNNs) have become increasingly concerning. This paper aims to provide both a theoretical foundation and a practical solution to ensure the reliability of DNNs. We explore the concept of Lipschitz continuity to certify the robustness of DNNs against adversarial attacks, which aim to mislead the network with adding imperceptible perturbations into… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  45. arXiv:2406.18862  [pdf, other

    cs.SD eess.AS

    Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study

    Authors: Peikun Chen, Sining Sun, Changhao Shan, Qing Yang, Lei Xie

    Abstract: Unified speech-text models like SpeechGPT, VioLA, and AudioPaLM have shown impressive performance across various speech-related tasks, especially in Automatic Speech Recognition (ASR). These models typically adopt a unified method to model discrete speech and text tokens, followed by training a decoder-only transformer. However, they are all designed for non-streaming ASR tasks, where the entire s… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted for Interspeech 2024

  46. arXiv:2406.18197  [pdf, other

    cs.CV

    Human-Free Automated Prompting for Vision-Language Anomaly Detection: Prompt Optimization with Meta-guiding Prompt Scheme

    Authors: Pi-Wei Chen, Jerry Chun-Wei Lin, Jia Ji, Feng-Hao Yeh, Chao-Chun Chen

    Abstract: Pre-trained vision-language models (VLMs) are highly adaptable to various downstream tasks through few-shot learning, making prompt-based anomaly detection a promising approach. Traditional methods depend on human-crafted prompts that require prior knowledge of specific anomaly types. Our goal is to develop a human-free prompt-based anomaly detection framework that optimally learns prompts through… ▽ More

    Submitted 30 August, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  47. arXiv:2406.17167  [pdf, other

    cs.LG

    Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis

    Authors: Hongkang Li, Meng Wang, Shuai Zhang, Sijia Liu, Pin-Yu Chen

    Abstract: Efficient training and inference algorithms, such as low-rank adaption and model pruning, have shown impressive performance for learning Transformer-based large foundation models. However, due to the technical challenges of the non-convex optimization caused by the complicated architecture of Transformers, the theoretical study of why these methods can be applied to learn Transformers is mostly el… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: IEEE SAM Workshop 2024

  48. arXiv:2406.15396  [pdf, other

    cs.CV cs.AI cs.LG

    Feature Purified Transformer With Cross-level Feature Guiding Decoder For Multi-class OOD and Anomaly Deteciton

    Authors: Jerry Chun-Wei Lin, Pi-Wei Chen, Chao-Chun Chen

    Abstract: Reconstruction networks are prevalently used in unsupervised anomaly and Out-of-Distribution (OOD) detection due to their independence from labeled anomaly data. However, in multi-class datasets, the effectiveness of anomaly detection is often compromised by the models' generalized reconstruction capabilities, which allow anomalies to blend within the expanded boundaries of normality resulting fro… ▽ More

    Submitted 30 April, 2024; originally announced June 2024.

    Comments: 12 pages

  49. arXiv:2406.13975  [pdf, other

    cs.CL cs.AI

    MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

    Authors: Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia

    Abstract: Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we pr… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  50. arXiv:2406.12822  [pdf, other

    cs.CL cs.AI

    Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?

    Authors: Pinzhen Chen, Simon Yu, Zhicheng Guo, Barry Haddow

    Abstract: Large language models, particularly multilingual ones, are designed, claimed, and expected to cater to native speakers of varied languages. We hypothesise that the current practices of fine-tuning and evaluating these models may not perfectly align with this objective owing to a heavy reliance on translation, which can introduce translation artefacts and defects. It remains unknown whether the nat… ▽ More

    Submitted 11 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.