Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 81 results for author: Cao, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.17942  [pdf, other

    cs.RO cs.IT

    A Novel Perception Entropy Metric for Optimizing Vehicle Perception with LiDAR Deployment

    Authors: Yongjiang He, Peng Cao, Zhongling Su, Xiaobo Liu

    Abstract: Developing an effective evaluation metric is crucial for accurately and swiftly measuring LiDAR perception performance. One major issue is the lack of metrics that can simultaneously generate fast and accurate evaluations based on either object detection or point cloud data. In this study, we propose a novel LiDAR perception entropy metric based on the probability of vehicle grid occupancy. This m… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  2. arXiv:2407.15556  [pdf, other

    cs.CL

    SETTP: Style Extraction and Tunable Inference via Dual-level Transferable Prompt Learning

    Authors: Chunzhen Jin, Yongfeng Huang, Yaqi Wang, Peng Cao, Osmar Zaiane

    Abstract: Text style transfer, an important research direction in natural language processing, aims to adapt the text to various preferences but often faces challenges with limited resources. In this work, we introduce a novel method termed Style Extraction and Tunable Inference via Dual-level Transferable Prompt Learning (SETTP) for effective style transfer in low-resource scenarios. First, SETTP learns so… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  3. arXiv:2407.13179  [pdf, other

    eess.IV cs.CV

    Learned HDR Image Compression for Perceptually Optimal Storage and Display

    Authors: Peibei Cao, Haoyu Chen, Jingzhe Ma, Yu-Chieh Yuan, Zhiyong Xie, Xin Xie, Haiqing Bai, Kede Ma

    Abstract: High dynamic range (HDR) capture and display have seen significant growth in popularity driven by the advancements in technology and increasing consumer demand for superior image quality. As a result, HDR image compression is crucial to fully realize the benefits of HDR imaging without suffering from large file sizes and inefficient data handling. Conventionally, this is achieved by introducing a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  4. arXiv:2407.10943  [pdf, other

    cs.RO cs.CV

    GRUtopia: Dream General Robots in a City at Scale

    Authors: Hanqing Wang, Jiahe Chen, Wensi Huang, Qingwei Ben, Tai Wang, Boyu Mi, Tao Huang, Siheng Zhao, Yilun Chen, Sizhe Yang, Peizhou Cao, Wenye Yu, Zichao Ye, Jialun Li, Junfeng Long, Zirui Wang, Huiling Wang, Ying Zhao, Zhongying Tu, Yu Qiao, Dahua Lin, Jiangmiao Pang

    Abstract: Recent works have been exploring the scaling laws in the field of Embodied AI. Given the prohibitive costs of collecting real-world data, we believe the Simulation-to-Real (Sim2Real) paradigm is a crucial step for scaling the learning of embodied models. This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots. It features several advancements:… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  5. arXiv:2407.05248  [pdf, other

    cs.CV

    Self-Paced Sample Selection for Barely-Supervised Medical Image Segmentation

    Authors: Junming Su, Zhiqiang Shen, Peng Cao, Jinzhu Yang, Osmar R. Zaiane

    Abstract: The existing barely-supervised medical image segmentation (BSS) methods, adopting a registration-segmentation paradigm, aim to learn from data with very few annotations to mitigate the extreme label scarcity problem. However, this paradigm poses a challenge: pseudo-labels generated by image registration come with significant noise. To address this issue, we propose a self-paced sample selection fr… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  6. arXiv:2406.16033  [pdf, other

    cs.CL

    Unlocking the Future: Exploring Look-Ahead Planning Mechanistic Interpretability in Large Language Models

    Authors: Tianyi Men, Pengfei Cao, Zhuoran Jin, Yubo Chen, Kang Liu, Jun Zhao

    Abstract: Planning, as the core module of agents, is crucial in various fields such as embodied agents, web navigation, and tool using. With the development of large language models (LLMs), some researchers treat large language models as intelligent agents to stimulate and evaluate their planning capabilities. However, the planning mechanism is still unclear. In this work, we focus on exploring the look-ahe… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  7. arXiv:2406.12416  [pdf, other

    cs.CL cs.AI

    Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models

    Authors: Hongbang Yuan, Yubo Chen, Pengfei Cao, Zhuoran Jin, Kang Liu, Jun Zhao

    Abstract: Large language models (LLMs) have achieved remarkable success but still tend to generate factually erroneous responses, a phenomenon known as hallucination. A recent trend is to use preference learning to fine-tune models to align with factuality. However, existing work primarily evaluates fine-tuned models on in-domain (ID) datasets and the factuality on out-of-domain (OOD) datasets remains under… ▽ More

    Submitted 27 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.11566  [pdf, other

    cs.CL

    MEMLA: Enhancing Multilingual Knowledge Editing with Neuron-Masked Low-Rank Adaptation

    Authors: Jiakuan Xie, Pengfei Cao, Yuheng Chen, Yubo Chen, Kang Liu, Jun Zhao

    Abstract: Knowledge editing aims to adjust the knowledge within large language models (LLMs) to prevent their responses from becoming obsolete or inaccurate. However, existing works on knowledge editing are primarily conducted in a single language, which is inadequate for multilingual language models. In this paper, we focus on multilingual knowledge editing (MKE), which requires propagating updates across… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  9. arXiv:2406.10890  [pdf, other

    cs.CL cs.AI cs.LG

    RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models

    Authors: Zhuoran Jin, Pengfei Cao, Chenhao Wang, Zhitao He, Hongbang Yuan, Jiachun Li, Yubo Chen, Kang Liu, Jun Zhao

    Abstract: Large language models (LLMs) inevitably memorize sensitive, copyrighted, and harmful knowledge from the training corpus; therefore, it is crucial to erase this knowledge from the models. Machine unlearning is a promising solution for efficiently removing specific knowledge by post hoc modifying models. In this paper, we propose a Real-World Knowledge Unlearning benchmark (RWKU) for LLM unlearning.… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 48 pages, 7 figures, 12 tables

  10. arXiv:2406.03917  [pdf, other

    cs.CV

    Frequency-based Matcher for Long-tailed Semantic Segmentation

    Authors: Shan Li, Lu Yang, Pu Cao, Liulei Li, Huadong Ma

    Abstract: The successful application of semantic segmentation technology in the real world has been among the most exciting achievements in the computer vision community over the past decade. Although the long-tailed phenomenon has been investigated in many fields, e.g., classification and object detection, it has not received enough attention in semantic segmentation and has become a non-negligible obstacl… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted for publication as a Regular paper in the IEEE Transactions on Multimedia

  11. arXiv:2405.18915  [pdf, other

    cs.CL cs.AI

    Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners

    Authors: Jiachun Li, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

    Abstract: Large language models (LLMs) suffer from serious unfaithful chain-of-thought (CoT) issues. Previous work attempts to measure and explain it but lacks in-depth analysis within CoTs and does not consider the interactions among all reasoning components jointly. In this paper, we first study the CoT faithfulness issue at the granularity of CoT steps, identify two reasoning paradigms: centralized reaso… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 25 pages, under review

  12. arXiv:2405.14117  [pdf, other

    cs.CL cs.AI

    Knowledge Localization: Mission Not Accomplished? Enter Query Localization!

    Authors: Yuheng Chen, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

    Abstract: Large language models (LLMs) store extensive factual knowledge, but the mechanisms behind how they store and express this knowledge remain unclear. The Knowledge Neuron (KN) thesis is a prominent theory for explaining these mechanisms. This theory is based on the knowledge localization (KL) assumption, which suggests that a fact can be localized to a few knowledge storage units, namely knowledge n… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  13. arXiv:2405.13089  [pdf, other

    cs.LG

    SEGAN: semi-supervised learning approach for missing data imputation

    Authors: Xiaohua Pan, Weifeng Wu, Peiran Liu, Zhen Li, Peng Lu, Peijian Cao, Jianfeng Zhang, Xianfei Qiu, YangYang Wu

    Abstract: In many practical real-world applications, data missing is a very common phenomenon, making the development of data-driven artificial intelligence theory and technology increasingly difficult. Data completion is an important method for missing data preprocessing. Most existing miss-ing data completion models directly use the known information in the missing data set but ignore the impact of the da… ▽ More

    Submitted 12 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  14. arXiv:2405.09777  [pdf, other

    cs.CV

    Rethinking Barely-Supervised Segmentation from an Unsupervised Domain Adaptation Perspective

    Authors: Zhiqiang Shen, Peng Cao, Junming Su, Jinzhu Yang, Osmar R. Zaiane

    Abstract: This paper investigates an extremely challenging problem, barely-supervised medical image segmentation (BSS), where the training dataset comprises limited labeled data with only single-slice annotations and numerous unlabeled images. Currently, state-of-the-art (SOTA) BSS methods utilize a registration-based paradigm, depending on image registration to propagate single-slice annotations into volum… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  15. arXiv:2404.04887  [pdf, other

    cs.CV

    A Clinical-oriented Multi-level Contrastive Learning Method for Disease Diagnosis in Low-quality Medical Images

    Authors: Qingshan Hou, Shuai Cheng, Peng Cao, Jinzhu Yang, Xiaoli Liu, Osmar R. Zaiane, Yih Chung Tham

    Abstract: Representation learning offers a conduit to elucidate distinctive features within the latent space and interpret the deep models. However, the randomness of lesion distribution and the complexity of low-quality factors in medical images pose great challenges for models to extract key lesion features. Disease diagnosis methods guided by contrastive learning (CL) have shown significant advantages in… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  16. arXiv:2403.17733  [pdf, other

    cs.CL

    Continual Few-shot Event Detection via Hierarchical Augmentation Networks

    Authors: Chenlong Zhang, Pengfei Cao, Yubo Chen, Kang Liu, Zhiqiang Zhang, Mengshu Sun, Jun Zhao

    Abstract: Traditional continual event detection relies on abundant labeled data for training, which is often impractical to obtain in real-world applications. In this paper, we introduce continual few-shot event detection (CFED), a more commonly encountered scenario when a substantial number of labeled samples are not accessible. The CFED task is challenging as it involves memorizing previous event types an… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  17. arXiv:2403.10133  [pdf, other

    cs.CV

    E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance

    Authors: Tianrui Huang, Pu Cao, Lu Yang, Chun Liu, Mengjie Hu, Zhiwei Liu, Qing Song

    Abstract: Diffusion-based image editing is a composite process of preserving the source image content and generating new content or applying modifications. While current editing approaches have made improvements under text guidance, most of them have only focused on preserving the information of the input image, disregarding the importance of editability and alignment to the target prompt. In this paper, we… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  18. arXiv:2403.08309  [pdf, other

    cs.LG cs.AI

    HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback

    Authors: Ang Li, Qiugen Xiao, Peng Cao, Jian Tang, Yi Yuan, Zijie Zhao, Xiaoyuan Chen, Liang Zhang, Xiangyang Li, Kaitong Yang, Weidong Guo, Yukang Gan, Xu Yu, Daniell Wang, Ying Shan

    Abstract: Reinforcement Learning from AI Feedback (RLAIF) has the advantages of shorter annotation cycles and lower costs over Reinforcement Learning from Human Feedback (RLHF), making it highly efficient during the rapid strategy iteration periods of large language model (LLM) training. Using ChatGPT as a labeler to provide feedback on open-domain prompts in RLAIF training, we observe an increase in human… ▽ More

    Submitted 14 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: 18 pages, 7 figures

  19. arXiv:2403.04279  [pdf, other

    cs.CV

    Controllable Generation with Text-to-Image Diffusion Models: A Survey

    Authors: Pu Cao, Feng Zhou, Qing Song, Lu Yang

    Abstract: In the rapidly advancing realm of visual generation, diffusion models have revolutionized the landscape, marking a significant shift in capabilities with their impressive text-guided generative functions. However, relying solely on text for conditioning these models does not fully cater to the varied and complex requirements of different applications and scenarios. Acknowledging this shortfall, a… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: A collection of resources on controllable generation with text-to-image diffusion models: https://github.com/PRIV-Creation/Awesome-Controllable-T2I-Diffusion-Models

  20. arXiv:2403.02959  [pdf, other

    cs.CL cs.AI

    SimuCourt: Building Judicial Decision-Making Agents with Real-world Judgement Documents

    Authors: Zhitao He, Pengfei Cao, Chenhao Wang, Zhuoran Jin, Yubo Chen, Jiexin Xu, Huaijun Li, Xiaojian Jiang, Kang Liu, Jun Zhao

    Abstract: With the development of deep learning, natural language processing technology has effectively improved the efficiency of various aspects of the traditional judicial industry. However, most current efforts focus solely on individual judicial stage, overlooking cross-stage collaboration. As the autonomous agents powered by large language models are becoming increasingly smart and able to make comple… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  21. arXiv:2403.02893  [pdf, other

    cs.CL cs.AI

    Zero-Shot Cross-Lingual Document-Level Event Causality Identification with Heterogeneous Graph Contrastive Transfer Learning

    Authors: Zhitao He, Pengfei Cao, Zhuoran Jin, Yubo Chen, Kang Liu, Zhiqiang Zhang, Mengshu Sun, Jun Zhao

    Abstract: Event Causality Identification (ECI) refers to the detection of causal relations between events in texts. However, most existing studies focus on sentence-level ECI with high-resource languages, leaving more challenging document-level ECI (DECI) with low-resource languages under-explored. In this paper, we propose a Heterogeneous Graph Interaction Model with Multi-granularity Contrastive Transfer… ▽ More

    Submitted 22 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  22. arXiv:2402.19103  [pdf, other

    cs.CL cs.AI

    Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models

    Authors: Hongbang Yuan, Pengfei Cao, Zhuoran Jin, Yubo Chen, Daojian Zeng, Kang Liu, Jun Zhao

    Abstract: Large Language Models (LLMs) have shown impressive capabilities but still suffer from the issue of hallucinations. A significant type of this issue is the false premise hallucination, which we define as the phenomenon when LLMs generate hallucinated text when confronted with false premise questions. In this paper, we perform a comprehensive analysis of the false premise hallucination and elucidate… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 12 pages, 5 figures, 5 tables

  23. arXiv:2402.18344  [pdf, other

    cs.CL

    Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning

    Authors: Jiachun Li, Pengfei Cao, Chenhao Wang, Zhuoran Jin, Yubo Chen, Daojian Zeng, Kang Liu, Jun Zhao

    Abstract: Large language models exhibit high-level commonsense reasoning abilities, especially with enhancement methods like Chain-of-Thought (CoT). However, we find these CoT-like methods lead to a considerable number of originally correct answers turning wrong, which we define as the Toxic CoT problem. To interpret and mitigate this problem, we first utilize attribution tracing and causal tracing methods… ▽ More

    Submitted 27 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted as a long paper to ACL 2024 Main, 25 pages, 22 figures

  24. arXiv:2402.18154  [pdf, other

    cs.CL cs.AI cs.IR

    Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models

    Authors: Zhuoran Jin, Pengfei Cao, Hongbang Yuan, Yubo Chen, Jiexin Xu, Huaijun Li, Xiaojian Jiang, Kang Liu, Jun Zhao

    Abstract: Recently, retrieval augmentation and tool augmentation have demonstrated a remarkable capability to expand the internal memory boundaries of language models (LMs) by providing external context. However, internal memory and external context inevitably clash, leading to knowledge conflicts within LMs. In this paper, we aim to interpret the mechanism of knowledge conflicts through the lens of informa… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 21 pages, 42 figures, 4 tables

  25. arXiv:2402.14409  [pdf, other

    cs.CL cs.AI cs.IR

    Tug-of-War Between Knowledge: Exploring and Resolving Knowledge Conflicts in Retrieval-Augmented Language Models

    Authors: Zhuoran Jin, Pengfei Cao, Yubo Chen, Kang Liu, Xiaojian Jiang, Jiexin Xu, Qiuxia Li, Jun Zhao

    Abstract: Retrieval-augmented language models (RALMs) have demonstrated significant potential in refining and expanding their internal memory by retrieving evidence from external sources. However, RALMs will inevitably encounter knowledge conflicts when integrating their internal memory with external sources. Knowledge conflicts can ensnare RALMs in a tug-of-war between knowledge, limiting their practical a… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted at LREC-COLING 2024

  26. arXiv:2402.13731  [pdf, other

    cs.CL cs.AI

    Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language Models

    Authors: Yuheng Chen, Pengfei Cao, Yubo Chen, Yining Wang, Shengping Liu, Kang Liu, Jun Zhao

    Abstract: Large language models (LLMs) store extensive factual knowledge, but the underlying mechanisms remain unclear. Previous research suggests that factual knowledge is stored within multi-layer perceptron weights, and some storage units exhibit degeneracy, referred to as Degenerate Knowledge Neurons (DKNs). Despite the novelty and unique properties of this concept, it has not been rigorously defined or… ▽ More

    Submitted 16 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  27. arXiv:2402.10987  [pdf, other

    cs.CL cs.AI

    WilKE: Wise-Layer Knowledge Editor for Lifelong Knowledge Editing

    Authors: Chenhui Hu, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

    Abstract: Knowledge editing aims to rectify inaccuracies in large language models (LLMs) without costly retraining for outdated or erroneous knowledge. However, current knowledge editing methods primarily focus on single editing, failing to meet the requirements for lifelong editing. This study reveals a performance degradation encountered by knowledge editing in lifelong editing, characterized by toxicity… ▽ More

    Submitted 5 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: To be published in ACL Findings 2024

  28. arXiv:2312.15182  [pdf, other

    eess.IV cs.CV cs.LG

    Narrowing the semantic gaps in U-Net with learnable skip connections: The case of medical image segmentation

    Authors: Haonan Wang, Peng Cao, Xiaoli Liu, Jinzhu Yang, Osmar Zaiane

    Abstract: Most state-of-the-art methods for medical image segmentation adopt the encoder-decoder architecture. However, this U-shaped framework still has limitations in capturing the non-local multi-scale information with a simple skip connection. To solve the problem, we firstly explore the potential weakness of skip connections in U-Net on multiple segmentation tasks, and find that i) not all skip connect… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  29. arXiv:2312.08195  [pdf, other

    cs.CV cs.AI cs.MM

    Concept-centric Personalization with Large-scale Diffusion Priors

    Authors: Pu Cao, Lu Yang, Feng Zhou, Tianrui Huang, Qing Song

    Abstract: Despite large-scale diffusion models being highly capable of generating diverse open-world content, they still struggle to match the photorealism and fidelity of concept-specific generators. In this work, we present the task of customizing large-scale diffusion priors for specific concepts as concept-centric personalization. Our goal is to generate high-quality concept-centric images while maintai… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  30. arXiv:2312.00987  [pdf, other

    cs.CV cs.CY

    Deep Generative Attacks and Countermeasures for Data-Driven Offline Signature Verification

    Authors: An Ngo, Rajesh Kumar, Phuong Cao

    Abstract: This study investigates the vulnerabilities of data-driven offline signature verification (DASV) systems to generative attacks and proposes robust countermeasures. Specifically, we explore the efficacy of Variational Autoencoders (VAEs) and Conditional Generative Adversarial Networks (CGANs) in creating deceptive signatures that challenge DASV systems. Using the Structural Similarity Index (SSIM)… ▽ More

    Submitted 17 July, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: Ten pages, 6 figures, 1 table, Signature verification, Deep generative models, attacks, generative attack explainability, data-driven verification system

    ACM Class: K.6.5

  31. arXiv:2311.12537  [pdf, other

    cs.CL cs.AI

    Oasis: Data Curation and Assessment System for Pretraining of Large Language Models

    Authors: Tong Zhou, Yubo Chen, Pengfei Cao, Kang Liu, Jun Zhao, Shengping Liu

    Abstract: Data is one of the most critical elements in building a large language model. However, existing systems either fail to customize a corpus curation pipeline or neglect to leverage comprehensive corpus assessment for iterative optimization of the curation. To this end, we present a pretraining corpus curation and assessment platform called Oasis -- a one-stop system for data quality improvement and… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  32. arXiv:2311.08045  [pdf, other

    cs.CL cs.AI cs.LG

    Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game

    Authors: Pengyu Cheng, Yifan Yang, Jian Li, Yong Dai, Tianhao Hu, Peixin Cao, Nan Du, Xiaolong Li

    Abstract: Human preference alignment is essential to improve the interaction quality of large language models (LLMs). Existing alignment methods depend on manually annotated preference data to guide the LLM optimization directions. However, continuously updating LLMs for alignment raises a distribution gap between model-generated samples and human-annotated responses, hindering training effectiveness. To mi… ▽ More

    Submitted 3 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted by ACL2024 findings

  33. arXiv:2310.16131  [pdf, other

    cs.CL

    GenKIE: Robust Generative Multimodal Document Key Information Extraction

    Authors: Panfeng Cao, Ye Wang, Qiang Zhang, Zaiqiao Meng

    Abstract: Key information extraction (KIE) from scanned documents has gained increasing attention because of its applications in various domains. Although promising results have been achieved by some recent KIE approaches, they are usually built based on discriminative models, which lack the ability to handle optical character recognition (OCR) errors and require laborious token-level labelling. In this pap… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP 2023, Findings paper

  34. arXiv:2310.12877  [pdf, other

    eess.IV cs.CV

    Perceptual Assessment and Optimization of HDR Image Rendering

    Authors: Peibei Cao, Rafal K. Mantiuk, Kede Ma

    Abstract: High dynamic range (HDR) rendering has the ability to faithfully reproduce the wide luminance ranges in natural scenes, but how to accurately assess the rendering quality is relatively underexplored. Existing quality models are mostly designed for low dynamic range (LDR) images, and do not align well with human perception of HDR image quality. To fill this gap, we propose a family of HDR quality m… ▽ More

    Submitted 16 June, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

  35. arXiv:2309.00514  [pdf

    cs.CV eess.IV

    A Machine Vision Method for Correction of Eccentric Error: Based on Adaptive Enhancement Algorithm

    Authors: Fanyi Wang, Pin Cao, Yihui Zhang, Haotian Hu, Yongying Yang

    Abstract: In the procedure of surface defects detection for large-aperture aspherical optical elements, it is of vital significance to adjust the optical axis of the element to be coaxial with the mechanical spin axis accurately. Therefore, a machine vision method for eccentric error correction is proposed in this paper. Focusing on the severe defocus blur of reference crosshair image caused by the imaging… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  36. arXiv:2308.15851  [pdf, other

    cs.MM

    Prompting Vision Language Model with Knowledge from Large Language Model for Knowledge-Based VQA

    Authors: Yang Zhou, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

    Abstract: Knowledge-based visual question answering is a very challenging and widely concerned task. Previous methods adopts the implicit knowledge in large language models (LLM) to achieve excellent results, but we argue that existing methods may suffer from biasing understanding of the image and insufficient knowledge to solve the problem. In this paper, we propose PROOFREAD -PROmpting vision language mod… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  37. arXiv:2308.14353  [pdf, other

    cs.CL

    ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models

    Authors: Baoli Zhang, Haining Xie, Pengfan Du, Junhao Chen, Pengfei Cao, Yubo Chen, Shengping Liu, Kang Liu, Jun Zhao

    Abstract: The unprecedented performance of large language models (LLMs) requires comprehensive and accurate evaluation. We argue that for LLMs evaluation, benchmarks need to be comprehensive and systematic. To this end, we propose the ZhuJiu benchmark, which has the following strengths: (1) Multi-dimensional ability coverage: We comprehensively evaluate LLMs across 7 ability dimensions covering 51 tasks. Es… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  38. arXiv:2308.13198  [pdf, other

    cs.CL

    Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons

    Authors: Yuheng Chen, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

    Abstract: Pre-trained language models (PLMs) contain vast amounts of factual knowledge, but how the knowledge is stored in the parameters remains unclear. This paper delves into the complex task of understanding how factual knowledge is stored in multilingual PLMs, and introduces the Architecture-adapted Multilingual Integrated Gradients method, which successfully localizes knowledge neurons more precisely… ▽ More

    Submitted 20 December, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

    Comments: Accepted in the 38th AAAI Conference on Artificial Intelligence (AAAI 2024)

  39. arXiv:2308.05692  [pdf, other

    cs.DC

    Isolated Scheduling for Distributed Training Tasks in GPU Clusters

    Authors: Xinchi Han, Weihao Jiang, Peirui Cao, Qinwei Yang, Yunzhuo Liu, Shuyao Qi, Shengkai Lin, Shizhen Zhao

    Abstract: Distributed machine learning (DML) technology makes it possible to train large neural networks in a reasonable amount of time. Meanwhile, as the computing power grows much faster than network capacity, network communication has gradually become the bottleneck of DML. Current multi-tenant GPU clusters face network contention caused by hash-collision problem which not only further increases the over… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

  40. arXiv:2305.05455  [pdf, other

    cs.NI cs.DC cs.OS

    ONCache: A Cache-Based Low-Overhead Container Overlay Network

    Authors: Shengkai Lin, Shizhen Zhao, Peirui Cao, Xinchi Han, Quan Tian, Wenfeng Liu, Qi Wu, Donghai Han, Xinbing Wang

    Abstract: Recent years have witnessed a widespread adoption of containers. While containers simplify and accelerate application development, existing container network technologies either incur significant overhead, which hurts performance for distributed applications, or lose flexibility or compatibility, which hinders the widespread deployment in production. We carefully analyze the kernel data path of… ▽ More

    Submitted 4 June, 2024; v1 submitted 4 May, 2023; originally announced May 2023.

  41. arXiv:2301.12141  [pdf, other

    cs.CV cs.AI

    What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion

    Authors: Pu Cao, Lu Yang, Dongxv Liu, Xiaoya Yang, Tianrui Huang, Qing Song

    Abstract: Recently, inversion methods have focused on additional high-rate information in the generator (e.g., weights or intermediate features) to refine inversion and editing results from embedded latent codes. Although these techniques gain reasonable improvement in reconstruction, they decrease editing capability, especially on complex images (e.g., containing occlusions, detailed backgrounds, and artif… ▽ More

    Submitted 1 November, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

    Comments: Accepted by WACV 2024

  42. arXiv:2301.06943  [pdf, other

    eess.IV cs.CV

    Self-supervised Domain Adaptation for Breaking the Limits of Low-quality Fundus Image Quality Enhancement

    Authors: Qingshan Hou, Peng Cao, Jiaqi Wang, Xiaoli Liu, Jinzhu Yang, Osmar R. Zaiane

    Abstract: Retinal fundus images have been applied for the diagnosis and screening of eye diseases, such as Diabetic Retinopathy (DR) or Diabetic Macular Edema (DME). However, both low-quality fundus images and style inconsistency potentially increase uncertainty in the diagnosis of fundus disease and even lead to misdiagnosis by ophthalmologists. Most of the existing image enhancement methods mainly focus o… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

  43. arXiv:2301.04465  [pdf, ps, other

    cs.CV

    Co-training with High-Confidence Pseudo Labels for Semi-supervised Medical Image Segmentation

    Authors: Zhiqiang Shen, Peng Cao, Hua Yang, Xiaoli Liu, Jinzhu Yang, Osmar R. Zaiane

    Abstract: Consistency regularization and pseudo labeling-based semi-supervised methods perform co-training using the pseudo labels from multi-view inputs. However, such co-training models tend to converge early to a consensus, degenerating to the self-training ones, and produce low-confidence pseudo labels from the perturbed inputs during training. To address these issues, we propose an Uncertainty-guided C… ▽ More

    Submitted 26 May, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

  44. arXiv:2301.00592  [pdf, other

    cs.CV eess.IV

    Edge Enhanced Image Style Transfer via Transformers

    Authors: Chiyu Zhang, Jun Yang, Zaiyan Dai, Peng Cao

    Abstract: In recent years, arbitrary image style transfer has attracted more and more attention. Given a pair of content and style images, a stylized one is hoped that retains the content from the former while catching style patterns from the latter. However, it is difficult to simultaneously keep well the trade-off between the content details and the style features. To stylize the image with sufficient sty… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

  45. arXiv:2212.03357  [pdf, other

    cs.LG eess.SP

    Contactless Oxygen Monitoring with Gated Transformer

    Authors: Hao He, Yuan Yuan, Ying-Cong Chen, Peng Cao, Dina Katabi

    Abstract: With the increasing popularity of telehealth, it becomes critical to ensure that basic physiological signals can be monitored accurately at home, with minimal patient overhead. In this paper, we propose a contactless approach for monitoring patients' blood oxygen at home, simply by analyzing the radio signals in the room, without any wearable devices. We extract the patients' respiration from the… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: 19 pages, Workshop on Learning from Time Series for Health, NeurIPS 2022

  46. Generalized Deep Learning-based Proximal Gradient Descent for MR Reconstruction

    Authors: Guanxiong Luo, Mengmeng Kuang, Peng Cao

    Abstract: The data consistency for the physical forward model is crucial in inverse problems, especially in MR imaging reconstruction. The standard way is to unroll an iterative algorithm into a neural network with a forward model embedded. The forward model always changes in clinical practice, so the learning component's entanglement with the forward model makes the reconstruction hard to generalize. The d… ▽ More

    Submitted 18 March, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: Keywords: MRI reconstruction, Deep Learning, Proximal gradient descent, Learned regularization term

  47. arXiv:2210.01189  [pdf, other

    cs.LG cs.AI cs.CV

    Rank-N-Contrast: Learning Continuous Representations for Regression

    Authors: Kaiwen Zha, Peng Cao, Jeany Son, Yuzhe Yang, Dina Katabi

    Abstract: Deep regression models typically learn in an end-to-end fashion without explicitly emphasizing a regression-aware representation. Consequently, the learned representations exhibit fragmentation and fail to capture the continuous nature of sample orders, inducing suboptimal results across a wide range of regression tasks. To fill the gap, we propose Rank-N-Contrast (RNC), a framework that learns co… ▽ More

    Submitted 9 October, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2023 Spotlight. The first two authors contributed equally to this paper

  48. arXiv:2209.12746  [pdf, other

    cs.CV cs.AI cs.LG

    LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space

    Authors: Pu Cao, Lu Yang, Dongxu Liu, Zhiwei Liu, Shan Li, Qing Song

    Abstract: As the methods evolve, inversion is mainly divided into two steps. The first step is Image Embedding, in which an encoder or optimization process embeds images to get the corresponding latent codes. Afterward, the second step aims to refine the inversion and editing results, which we named Result Refinement. Although the second step significantly improves fidelity, perception and editability are a… ▽ More

    Submitted 16 March, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: under review

  49. arXiv:2207.14769  [pdf, other

    cs.CV

    Image Quality Assessment: Integrating Model-Centric and Data-Centric Approaches

    Authors: Peibei Cao, Dingquan Li, Kede Ma

    Abstract: Learning-based image quality assessment (IQA) has made remarkable progress in the past decade, but nearly all consider the two key components -- model and data -- in isolation. Specifically, model-centric IQA focuses on developing ``better'' objective quality methods on fixed and extensively reused datasets, with a great danger of overfitting. Data-centric IQA involves conducting psychophysical ex… ▽ More

    Submitted 8 December, 2023; v1 submitted 29 July, 2022; originally announced July 2022.

  50. arXiv:2206.09146  [pdf, other

    eess.IV cs.AI cs.CV

    A Perceptually Optimized and Self-Calibrated Tone Mapping Operator

    Authors: Peibei Cao, Chenyang Le, Yuming Fang, Kede Ma

    Abstract: With the increasing popularity and accessibility of high dynamic range (HDR) photography, tone mapping operators (TMOs) for dynamic range compression are practically demanding. In this paper, we develop a two-stage neural network-based TMO that is self-calibrated and perceptually optimized. In Stage one, motivated by the physiology of the early stages of the human visual system, we first decompose… ▽ More

    Submitted 25 August, 2023; v1 submitted 18 June, 2022; originally announced June 2022.

    Comments: 15 pages,17 figures