Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 199 results for author: Zheng, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02157  [pdf, other

    cs.CV cs.HC

    FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs

    Authors: Haodong Chen, Haojian Huang, Junhao Dong, Mingzhe Zheng, Dian Shao

    Abstract: Dynamic Facial Expression Recognition (DFER) is crucial for understanding human behavior. However, current methods exhibit limited performance mainly due to the scarcity of high-quality data, the insufficient utilization of facial dynamics, and the ambiguity of expression semantics, etc. To this end, we propose a novel framework, named Multi-modal Fine-grained CLIP for Dynamic Facial Expression Re… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Project Page: https://haroldchen19.github.io/FineCLIPER-Page/

  2. arXiv:2406.18725  [pdf, other

    cs.LG cs.CL

    Jailbreaking LLMs with Arabic Transliteration and Arabizi

    Authors: Mansour Al Ghanim, Saleh Almohaimeed, Mengxin Zheng, Yan Solihin, Qian Lou

    Abstract: This study identifies the potential vulnerabilities of Large Language Models (LLMs) to 'jailbreak' attacks, specifically focusing on the Arabic language and its various forms. While most research has concentrated on English-based prompt manipulation, our investigation broadens the scope to investigate the Arabic language. We initially tested the AdvBench benchmark in Standardized Arabic, finding t… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures

  3. arXiv:2406.15695  [pdf, other

    cs.CL

    SS-Bench: A Benchmark for Social Story Generation and Evaluation

    Authors: Yi Feng, Mingyang Song, Jiaqi Wang, Mao Zheng, Liping Jing, Jian Yu

    Abstract: Children with Autism Spectrum Disorder (ASD) often misunderstand social situations and struggle to participate in daily routines. Psychology experts write Social Stories under strict constraints of structural clarity, descriptive orientation, and situational safety to enhance their abilities in these regimes. However, Social Stories are costly in creation and often limited in diversity and timelin… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  4. arXiv:2406.14599  [pdf, other

    cs.CV

    Stylebreeder: Exploring and Democratizing Artistic Styles through Text-to-Image Models

    Authors: Matthew Zheng, Enis Simsar, Hidir Yesiltepe, Federico Tombari, Joel Simon, Pinar Yanardag

    Abstract: Text-to-image models are becoming increasingly popular, revolutionizing the landscape of digital art creation by enabling highly detailed and creative visual content generation. These models have been widely employed across various domains, particularly in art generation, where they facilitate a broad spectrum of creative expression and democratize access to artistic creation. In this paper, we in… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.11629  [pdf, other

    cs.CL

    Can Many-Shot In-Context Learning Help Long-Context LLM Judges? See More, Judge Better!

    Authors: Mingyang Song, Mao Zheng, Xuan Luo

    Abstract: Leveraging Large Language Models (LLMs) as judges for judging the performance of LLMs has recently garnered attention. However, this type of approach is affected by the potential biases in LLMs, raising concerns about the reliability of the evaluation results. To mitigate this issue, we propose and study two versions of many-shot in-context prompts, which rely on two existing settings of many-shot… ▽ More

    Submitted 30 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: work in progress

  6. arXiv:2406.08283  [pdf, other

    cs.RO eess.SY

    A Hybrid Task-Constrained Motion Planning for Collaborative Robots in Intelligent Remanufacturing

    Authors: Wansong Liu, Chang Liu, Xiao Liang, Minghui Zheng

    Abstract: Industrial manipulators have extensively collaborated with human operators to execute tasks, e.g., disassembly of end-of-use products, in intelligent remanufacturing. A safety task execution requires real-time path planning for the manipulator's end-effector to autonomously avoid human operators. This is even more challenging when the end-effector needs to follow a planned path while avoiding the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  7. arXiv:2406.08100  [pdf, other

    cs.CL cs.AI

    Multimodal Table Understanding

    Authors: Mingyu Zheng, Xinwei Feng, Qingyi Si, Qiaoqiao She, Zheng Lin, Wenbin Jiang, Weiping Wang

    Abstract: Although great progress has been made by previous table understanding methods including recent approaches based on large language models (LLMs), they rely heavily on the premise that given tables must be converted into a certain text sequence (such as Markdown or HTML) to serve as model input. However, it is difficult to access such high-quality textual table representations in some real-world sce… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 23 pages, 16 figures, ACL 2024 main conference, camera-ready version

  8. arXiv:2406.02518  [pdf, other

    cs.CV eess.IV

    DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering

    Authors: Zhongpai Gao, Benjamin Planche, Meng Zheng, Xiao Chen, Terrence Chen, Ziyan Wu

    Abstract: Digitally reconstructed radiographs (DRRs) are simulated 2D X-ray images generated from 3D CT volumes, widely used in preoperative settings but limited in intraoperative applications due to computational bottlenecks, especially for accurate but heavy physics-based Monte Carlo methods. While analytical DRR renderers offer greater efficiency, they overlook anisotropic X-ray image formation phenomena… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  9. arXiv:2406.01873  [pdf, other

    cs.CL cs.CR cs.LG

    CR-UTP: Certified Robustness against Universal Text Perturbations on Large Language Models

    Authors: Qian Lou, Xin Liang, Jiaqi Xue, Yancheng Zhang, Rui Xie, Mengxin Zheng

    Abstract: It is imperative to ensure the stability of every prediction made by a language model; that is, a language's prediction should remain consistent despite minor input variations, like word substitutions. In this paper, we investigate the problem of certifying a language model's robustness against Universal Text Perturbations (UTPs), which have been widely used in universal adversarial attacks and ba… ▽ More

    Submitted 5 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL Findings 2024

  10. arXiv:2406.00083  [pdf, other

    cs.CR cs.AI cs.CL cs.IR cs.LG

    BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models

    Authors: Jiaqi Xue, Mengxin Zheng, Yebowen Hu, Fei Liu, Xun Chen, Qian Lou

    Abstract: Large Language Models (LLMs) are constrained by outdated information and a tendency to generate incorrect data, commonly referred to as "hallucinations." Retrieval-Augmented Generation (RAG) addresses these limitations by combining the strengths of retrieval-based methods and generative models. This approach involves retrieving relevant information from a large, up-to-date dataset and using it to… ▽ More

    Submitted 6 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  11. arXiv:2405.11913  [pdf, other

    cs.CV

    Diff-BGM: A Diffusion Model for Video Background Music Generation

    Authors: Sizhe Li, Yiming Qin, Minghang Zheng, Xin Jin, Yang Liu

    Abstract: When editing a video, a piece of attractive background music is indispensable. However, video background music generation tasks face several challenges, for example, the lack of suitable training datasets, and the difficulties in flexibly controlling the music generation process and sequentially aligning the video and music. In this work, we first propose a high-quality music-video dataset BGM909… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024(Poster)

  12. arXiv:2405.11607  [pdf, other

    cs.CR cs.AR

    OFHE: An Electro-Optical Accelerator for Discretized TFHE

    Authors: Mengxin Zheng, Cheng Chu, Qian Lou, Nathan Youngblood, Mo Li, Sajjad Moazeni, Lei Jiang

    Abstract: This paper presents \textit{OFHE}, an electro-optical accelerator designed to process Discretized TFHE (DTFHE) operations, which encrypt multi-bit messages and support homomorphic multiplications, lookup table operations and full-domain functional bootstrappings. While DTFHE is more efficient and versatile than other fully homomorphic encryption schemes, it requires 32-, 64-, and 128-bit polynomia… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  13. arXiv:2405.09779  [pdf, other

    cs.RO

    Integrating Uncertainty-Aware Human Motion Prediction into Graph-Based Manipulator Motion Planning

    Authors: Wansong Liu, Kareem Eltouny, Sibo Tian, Xiao Liang, Minghui Zheng

    Abstract: There has been a growing utilization of industrial robots as complementary collaborators for human workers in re-manufacturing sites. Such a human-robot collaboration (HRC) aims to assist human workers in improving the flexibility and efficiency of labor-intensive tasks. In this paper, we propose a human-aware motion planning framework for HRC to effectively compute collision-free motions for mani… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  14. arXiv:2405.07962  [pdf, other

    cs.RO eess.SY

    KG-Planner: Knowledge-Informed Graph Neural Planning for Collaborative Manipulators

    Authors: Wansong Liu, Kareem Eltouny, Sibo Tian, Xiao Liang, Minghui Zheng

    Abstract: This paper presents a novel knowledge-informed graph neural planner (KG-Planner) to address the challenge of efficiently planning collision-free motions for robots in high-dimensional spaces, considering both static and dynamic environments involving humans. Unlike traditional motion planners that struggle with finding a balance between efficiency and optimality, the KG-Planner takes a different a… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  15. arXiv:2404.12141  [pdf, other

    q-bio.BM cs.LG

    MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space

    Authors: Yanru Qu, Keyue Qiu, Yuxuan Song, Jingjing Gong, Jiawei Han, Mingyue Zheng, Hao Zhou, Wei-Ying Ma

    Abstract: Generative models for structure-based drug design (SBDD) have shown promising results in recent years. Existing works mainly focus on how to generate molecules with higher binding affinity, ignoring the feasibility prerequisites for generated 3D poses and resulting in false positives. We conduct thorough studies on key factors of ill-conformational problems when applying autoregressive methods and… ▽ More

    Submitted 27 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted to ICML 2024

  16. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  17. Improving Disturbance Estimation and Suppression via Learning among Systems with Mismatched Dynamics

    Authors: Harsh Modi, Zhu Chen, Xiao Liang, Minghui Zheng

    Abstract: Iterative learning control (ILC) is a method for reducing system tracking or estimation errors over multiple iterations by using information from past iterations. The disturbance observer (DOB) is used to estimate and mitigate disturbances within the system, while the system is being affected by them. ILC enhances system performance by introducing a feedforward signal in each iteration. However, i… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  18. arXiv:2404.05595  [pdf, other

    cs.CV

    UniFL: Improve Stable Diffusion via Unified Feedback Learning

    Authors: Jiacheng Zhang, Jie Wu, Yuxi Ren, Xin Xia, Huafeng Kuang, Pan Xie, Jiashi Li, Xuefeng Xiao, Min Zheng, Lean Fu, Guanbin Li

    Abstract: Diffusion models have revolutionized the field of image generation, leading to the proliferation of high-quality models and diverse downstream applications. However, despite these significant advancements, the current competitive solutions still suffer from several limitations, including inferior visual quality, a lack of aesthetic appeal, and inefficient inference, without a comprehensive solutio… ▽ More

    Submitted 22 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  19. arXiv:2404.04860  [pdf, other

    cs.CV

    ByteEdit: Boost, Comply and Accelerate Generative Image Editing

    Authors: Yuxi Ren, Jie Wu, Yanzuo Lu, Huafeng Kuang, Xin Xia, Xionghui Wang, Qianqian Wang, Yixing Zhu, Pan Xie, Shiyin Wang, Xuefeng Xiao, Yitong Wang, Min Zheng, Lean Fu

    Abstract: Recent advancements in diffusion-based generative image editing have sparked a profound revolution, reshaping the landscape of image outpainting and inpainting tasks. Despite these strides, the field grapples with inherent challenges, including: i) inferior quality; ii) poor consistency; iii) insufficient instrcution adherence; iv) suboptimal generation efficiency. To address these obstacles, we p… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  20. arXiv:2403.15441  [pdf, other

    physics.chem-ph cs.AI cs.LG q-bio.BM

    Unified Generative Modeling of 3D Molecules via Bayesian Flow Networks

    Authors: Yuxuan Song, Jingjing Gong, Yanru Qu, Hao Zhou, Mingyue Zheng, Jingjing Liu, Wei-Ying Ma

    Abstract: Advanced generative model (e.g., diffusion model) derived from simplified continuity assumptions of data distribution, though showing promising progress, has been difficult to apply directly to geometry generation applications due to the multi-modality and noise-sensitive nature of molecule geometry. This work introduces Geometric Bayesian Flow Networks (GeoBFN), which naturally fits molecule geom… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: ICLR 2024

  21. arXiv:2403.11802  [pdf, other

    cs.CL

    Counting-Stars: A Multi-evidence, Position-aware, and Scalable Benchmark for Evaluating Long-Context Large Language Models

    Authors: Mingyang Song, Mao Zheng, Xuan Luo

    Abstract: While recent research endeavors have focused on developing Large Language Models (LLMs) with robust long-context capabilities, due to the lack of long-context benchmarks, relatively little is known about how well the performance of long-context LLMs. To address this gap, we propose a multi-evidence, position-aware, and scalable benchmark for evaluating long-context LLMs, named Counting-Stars, whic… ▽ More

    Submitted 17 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: work in progress

  22. arXiv:2403.05918  [pdf

    cs.LG cs.AI

    SEMRes-DDPM: Residual Network Based Diffusion Modelling Applied to Imbalanced Data

    Authors: Ming Zheng, Yang Yang, Zhi-Hang Zhao, Shan-Chao Gan, Yang Chen, Si-Kai Ni, Yang Lu

    Abstract: In the field of data mining and machine learning, commonly used classification models cannot effectively learn in unbalanced data. In order to balance the data distribution before model training, oversampling methods are often used to generate data for a small number of classes to solve the problem of classifying unbalanced data. Most of the classical oversampling methods are based on the SMOTE te… ▽ More

    Submitted 11 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: None

  23. arXiv:2403.05807  [pdf, other

    cs.CV eess.IV

    A self-supervised CNN for image watermark removal

    Authors: Chunwei Tian, Menghua Zheng, Tiancai Jiao, Wangmeng Zuo, Yanning Zhang, Chia-Wen Lin

    Abstract: Popular convolutional neural networks mainly use paired images in a supervised way for image watermark removal. However, watermarked images do not have reference images in the real world, which results in poor robustness of image watermark removal techniques. In this paper, we propose a self-supervised convolutional neural network (CNN) in image watermark removal (SWCNN). SWCNN uses a self-supervi… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  24. arXiv:2403.05758  [pdf, other

    cs.CV

    Automating Catheterization Labs with Real-Time Perception

    Authors: Fan Yang, Benjamin Planche, Meng Zheng, Cheng Chen, Terrence Chen, Ziyan Wu

    Abstract: For decades, three-dimensional C-arm Cone-Beam Computed Tomography (CBCT) imaging system has been a critical component for complex vascular and nonvascular interventional procedures. While it can significantly improve multiplanar soft tissue imaging and provide pre-treatment target lesion roadmapping and guidance, the traditional workflow can be cumbersome and time-consuming, especially for less e… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  25. arXiv:2403.03217  [pdf, other

    cs.CV

    Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion

    Authors: Meng Zheng, Benjamin Planche, Xuan Gong, Fan Yang, Terrence Chen, Ziyan Wu

    Abstract: 3D patient body modeling is critical to the success of automated patient positioning for smart medical scanning and operating rooms. Existing CNN-based end-to-end patient modeling solutions typically require a) customized network designs demanding large amount of relevant training data, covering extensive realistic clinical scenarios (e.g., patient covered by sheets), which leads to suboptimal gen… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: MICCAI 2022

  26. arXiv:2403.02211  [pdf, other

    cs.CV

    Perceptive self-supervised learning network for noisy image watermark removal

    Authors: Chunwei Tian, Menghua Zheng, Bo Li, Yanning Zhang, Shichao Zhang, David Zhang

    Abstract: Popular methods usually use a degradation model in a supervised way to learn a watermark removal model. However, it is true that reference images are difficult to obtain in the real world, as well as collected images by cameras suffer from noise. To overcome these drawbacks, we propose a perceptive self-supervised learning network for noisy image watermark removal (PSLNet) in this paper. PSLNet de… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  27. arXiv:2403.02084  [pdf, other

    cs.CV

    ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

    Authors: Jiaxiang Cheng, Pan Xie, Xin Xia, Jiashi Li, Jie Wu, Yuxi Ren, Huixia Li, Xuefeng Xiao, Min Zheng, Lean Fu

    Abstract: Recent advancement in text-to-image models (e.g., Stable Diffusion) and corresponding personalized technologies (e.g., DreamBooth and LoRA) enables individuals to generate high-quality and imaginative images. However, they often suffer from limitations when generating images with resolutions outside of their trained domain. To overcome this limitation, we present the Resolution Adapter (ResAdapter… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 21 pages, 16 figures

  28. arXiv:2402.17200  [pdf, other

    cs.CV eess.IV

    Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain

    Authors: Qunliang Xing, Mai Xu, Shengxi Li, Xin Deng, Meisong Zheng, Huaida Liu, Ying Chen

    Abstract: Existing quality enhancement methods for compressed images focus on aligning the enhancement domain with the raw domain to yield realistic images. However, these methods exhibit a pervasive enhancement bias towards the compression domain, inadvertently regarding it as more realistic than the raw domain. This bias makes enhanced images closely resemble their compressed counterparts, thus degrading… ▽ More

    Submitted 19 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024

  29. arXiv:2402.16929  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    LangGPT: Rethinking Structured Reusable Prompt Design Framework for LLMs from the Programming Language

    Authors: Ming Wang, Yuanzhong Liu, Xiaoyu Liang, Songlian Li, Yijie Huang, Xiaoming Zhang, Sijia Shen, Chaofeng Guan, Daling Wang, Shi Feng, Huaiwen Zhang, Yifei Zhang, Minghui Zheng, Chi Zhang

    Abstract: LLMs have demonstrated commendable performance across diverse domains. Nevertheless, formulating high-quality prompts to instruct LLMs proficiently poses a challenge for non-AI experts. Existing research in prompt engineering suggests somewhat scattered optimization principles and designs empirically dependent prompt optimizers. Unfortunately, these endeavors lack a structured design template, inc… ▽ More

    Submitted 29 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  30. arXiv:2402.14315  [pdf, other

    q-bio.BM cs.LG

    Structure-Based Drug Design via 3D Molecular Generative Pre-training and Sampling

    Authors: Yuwei Yang, Siqi Ouyang, Xueyu Hu, Mingyue Zheng, Hao Zhou, Lei Li

    Abstract: Structure-based drug design aims at generating high affinity ligands with prior knowledge of 3D target structures. Existing methods either use conditional generative model to learn the distribution of 3D ligands given target binding sites, or iteratively modify molecules to optimize a structure-based activity estimator. The former is highly constrained by data quantity and quality, which leaves op… ▽ More

    Submitted 15 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  31. arXiv:2402.13045  [pdf, other

    cs.RO

    A Recurrent Neural Network Enhanced Unscented Kalman Filter for Human Motion Prediction

    Authors: Wansong Liu, Sibo Tian, Boyi Hu, Xiao Liang, Minghui Zheng

    Abstract: This paper presents a deep learning enhanced adaptive unscented Kalman filter (UKF) for predicting human arm motion in the context of manufacturing. Unlike previous network-based methods that solely rely on captured human motion data, which is represented as bone vectors in this paper, we incorporate a human arm dynamic model into the motion prediction algorithm and use the UKF to iteratively fore… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  32. arXiv:2402.07814  [pdf, other

    cs.CV cs.AI

    PBADet: A One-Stage Anchor-Free Approach for Part-Body Association

    Authors: Zhongpai Gao, Huayi Zhou, Abhishek Sharma, Meng Zheng, Benjamin Planche, Terrence Chen, Ziyan Wu

    Abstract: The detection of human parts (e.g., hands, face) and their correct association with individuals is an essential task, e.g., for ubiquitous human-machine interfaces and action recognition. Traditional methods often employ multi-stage processes, rely on cumbersome anchor-based systems, or do not scale well to larger part sets. This paper presents PBADet, a novel one-stage, anchor-free approach for p… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted by ICLR2024

  33. VRMM: A Volumetric Relightable Morphable Head Model

    Authors: Haotian Yang, Mingwu Zheng, Chongyang Ma, Yu-Kun Lai, Pengfei Wan, Haibin Huang

    Abstract: In this paper, we introduce the Volumetric Relightable Morphable Model (VRMM), a novel volumetric and parametric facial prior for 3D face modeling. While recent volumetric prior models offer improvements over traditional methods like 3D Morphable Models (3DMMs), they face challenges in model learning and personalized reconstructions. Our VRMM overcomes these by employing a novel training framework… ▽ More

    Submitted 8 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted to SIGGRAPH 2024 (Conference); Project page: https://vrmm-paper.github.io/

  34. arXiv:2401.16355  [pdf, other

    cs.CV

    PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology

    Authors: Yuxuan Sun, Hao Wu, Chenglu Zhu, Sunyi Zheng, Qizi Chen, Kai Zhang, Yunlong Zhang, Dan Wan, Xiaoxiao Lan, Mengyue Zheng, Jingxiong Li, Xinheng Lyu, Tao Lin, Lin Yang

    Abstract: The emergence of large multimodal models has unlocked remarkable potential in AI, particularly in pathology. However, the lack of specialized, high-quality benchmark impeded their development and precise evaluation. To address this, we introduce PathMMU, the largest and highest-quality expert-validated pathology benchmark for Large Multimodal Models (LMMs). It comprises 33,428 multimodal multi-cho… ▽ More

    Submitted 20 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 27 pages, 12 figures

  35. arXiv:2312.16979  [pdf, other

    cs.CR

    BlackboxBench: A Comprehensive Benchmark of Black-box Adversarial Attacks

    Authors: Meixi Zheng, Xuanchen Yan, Zihao Zhu, Hongrui Chen, Baoyuan Wu

    Abstract: Adversarial examples are well-known tools to evaluate the vulnerability of deep neural networks (DNNs). Although lots of adversarial attack algorithms have been developed, it is still challenging in the practical scenario that the model's parameters and architectures are inaccessible to the attacker/evaluator, i.e., black-box adversarial attacks. Due to the practical importance, there has been rap… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 37 pages, 29 figures

  36. arXiv:2312.16693  [pdf, other

    cs.CV

    I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models

    Authors: Xun Guo, Mingwu Zheng, Liang Hou, Yuan Gao, Yufan Deng, Pengfei Wan, Di Zhang, Yufan Liu, Weiming Hu, Zhengjun Zha, Haibin Huang, Chongyang Ma

    Abstract: Text-guided image-to-video (I2V) generation aims to generate a coherent video that preserves the identity of the input image and semantically aligns with the input prompt. Existing methods typically augment pretrained text-to-video (T2V) models by either concatenating the image with noised video frames channel-wise before being fed into the model or injecting the image embedding produced by pretra… ▽ More

    Submitted 26 June, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

  37. arXiv:2312.14033  [pdf, other

    cs.CL

    T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step

    Authors: Zehui Chen, Weihua Du, Wenwei Zhang, Kuikun Liu, Jiangning Liu, Miao Zheng, Jingming Zhuo, Songyang Zhang, Dahua Lin, Kai Chen, Feng Zhao

    Abstract: Large language models (LLM) have achieved remarkable performance on various NLP tasks and are augmented by tools for broader applications. Yet, how to evaluate and analyze the tool-utilization capability of LLMs is still under-explored. In contrast to previous works that evaluate models holistically, we comprehensively decompose the tool utilization into multiple sub-processes, including instructi… ▽ More

    Submitted 14 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Project: https://open-compass.github.io/T-Eval

  38. arXiv:2312.10508  [pdf, other

    cs.LG

    TrojFair: Trojan Fairness Attacks

    Authors: Mengxin Zheng, Jiaqi Xue, Yi Sheng, Lei Yang, Qian Lou, Lei Jiang

    Abstract: Deep learning models have been incorporated into high-stakes sectors, including healthcare diagnosis, loan approvals, and candidate recruitment, among others. Consequently, any bias or unfairness in these models can harm those who depend on such models. In response, many algorithms have emerged to ensure fairness in deep learning. However, while the potential for harm is substantial, the resilienc… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: 12 pages, 2 figures

  39. arXiv:2312.10467  [pdf, other

    cs.LG

    TrojFSP: Trojan Insertion in Few-shot Prompt Tuning

    Authors: Mengxin Zheng, Jiaqi Xue, Xun Chen, YanShan Wang, Qian Lou, Lei Jiang

    Abstract: Prompt tuning is one of the most effective solutions to adapting a fixed pre-trained language model (PLM) for various downstream tasks, especially with only a few input samples. However, the security issues, e.g., Trojan attacks, of prompt tuning on a few data samples are not well-studied. Transferring established data poisoning attacks directly to few-shot prompt tuning presents multiple challeng… ▽ More

    Submitted 18 March, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

    Comments: 9 pages, 2 figures

  40. arXiv:2312.10246  [pdf, other

    cs.CV cs.LG

    Implicit Modeling of Non-rigid Objects with Cross-Category Signals

    Authors: Yuchun Liu, Benjamin Planche, Meng Zheng, Zhongpai Gao, Pierre Sibut-Bourde, Fan Yang, Terrence Chen, Ziyan Wu

    Abstract: Deep implicit functions (DIFs) have emerged as a potent and articulate means of representing 3D shapes. However, methods modeling object categories or non-rigid entities have mainly focused on single-object scenarios. In this work, we propose MODIF, a multi-object deep implicit function that jointly learns the deformation fields and instance-specific latent codes for multiple objects at once. Our… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted at AAAI 2024. Paper + supplementary material

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 38(1), 2024

  41. arXiv:2312.08890  [pdf, other

    cs.CV cs.CR cs.LG

    Defenses in Adversarial Machine Learning: A Survey

    Authors: Baoyuan Wu, Shaokui Wei, Mingli Zhu, Meixi Zheng, Zihao Zhu, Mingda Zhang, Hongrui Chen, Danni Yuan, Li Liu, Qingshan Liu

    Abstract: Adversarial phenomenon has been widely observed in machine learning (ML) systems, especially in those using deep neural networks, describing that ML systems may produce inconsistent and incomprehensible predictions with humans at some particular cases. This phenomenon poses a serious security threat to the practical application of ML systems, and several advanced attack paradigms have been develop… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: 21 pages, 5 figures, 2 tables, 237 reference papers

  42. arXiv:2312.08664  [pdf, other

    cs.CV

    SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration

    Authors: Kezheng Xiong, Maoji Zheng, Qingshan Xu, Chenglu Wen, Siqi Shen, Cheng Wang

    Abstract: Point cloud registration, a fundamental task in 3D computer vision, has remained largely unexplored in cross-source point clouds and unstructured scenes. The primary challenges arise from noise, outliers, and variations in scale and density. However, neglected geometric natures of point clouds restricts the performance of current methods. In this paper, we propose a novel method termed SPEAL to le… ▽ More

    Submitted 3 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  43. arXiv:2312.04028  [pdf, other

    cs.CV

    ImFace++: A Sophisticated Nonlinear 3D Morphable Face Model with Implicit Neural Representations

    Authors: Mingwu Zheng, Haiyu Zhang, Hongyu Yang, Liming Chen, Di Huang

    Abstract: Accurate representations of 3D faces are of paramount importance in various computer vision and graphics applications. However, the challenges persist due to the limitations imposed by data discretization and model linearity, which hinder the precise capture of identity and expression clues in current studies. This paper presents a novel 3D morphable face model, named ImFace++, to learn a sophisti… ▽ More

    Submitted 2 January, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Project page: https://github.com/MingwuZheng/ImFace/tree/imface%2B%2B. arXiv admin note: text overlap with arXiv:2203.14510

  44. arXiv:2311.15906  [pdf, other

    cs.CV cs.LG

    MetaDefa: Meta-learning based on Domain Enhancement and Feature Alignment for Single Domain Generalization

    Authors: Can Sun, Hao Zheng, Zhigang Hu, Liu Yang, Meiguang Zheng, Bo Xu

    Abstract: The single domain generalization(SDG) based on meta-learning has emerged as an effective technique for solving the domain-shift problem. However, the inadequate match of data distribution between source and augmented domains and difficult separation of domain-invariant features from domain-related features make SDG model hard to achieve great generalization. Therefore, a novel meta-learning method… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 5 pages, 3 figures

  45. arXiv:2311.15231  [pdf, other

    cs.CV cs.LG eess.IV

    Double Reverse Regularization Network Based on Self-Knowledge Distillation for SAR Object Classification

    Authors: Bo Xu, Hao Zheng, Zhigang Hu, Liu Yang, Meiguang Zheng

    Abstract: In current synthetic aperture radar (SAR) object classification, one of the major challenges is the severe overfitting issue due to the limited dataset (few-shot) and noisy data. Considering the advantages of knowledge distillation as a learned label smoothing regularization, this paper proposes a novel Double Reverse Regularization Network based on Self-Knowledge Distillation (DRRNet-SKD). Specif… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: 6 pages, 8 figures

  46. arXiv:2311.15202  [pdf, other

    cs.CV

    Dual-stream contrastive predictive network with joint handcrafted feature view for SAR ship classification

    Authors: Xianting Feng, Hao zheng, Zhigang Hu, Liu Yang, Meiguang Zheng

    Abstract: Most existing synthetic aperture radar (SAR) ship classification technologies heavily rely on correctly labeled data, ignoring the discriminative features of unlabeled SAR ship images. Even though researchers try to enrich CNN-based features by introducing traditional handcrafted features, existing methods easily cause information redundancy and fail to capture the interaction between them. To add… ▽ More

    Submitted 30 November, 2023; v1 submitted 26 November, 2023; originally announced November 2023.

    Comments: 6 pages, 3 figures, ICASSP2024

  47. arXiv:2311.10054  [pdf, other

    cs.CL cs.AI cs.CY cs.HC cs.LG

    Is "A Helpful Assistant" the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts

    Authors: Mingqian Zheng, Jiaxin Pei, David Jurgens

    Abstract: Prompting serves as the major way humans interact with Large Language Models (LLM). Commercial AI systems commonly define the role of the LLM in system prompts. For example, ChatGPT uses "You are a helpful assistant" as part of the default system prompt. But is "a helpful assistant" the best role for LLMs? In this study, we present a systematic evaluation of how social roles in system prompts affe… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  48. arXiv:2311.08811  [pdf, other

    cs.CV cs.LG

    Correlation-aware active learning for surgery video segmentation

    Authors: Fei Wu, Pablo Marquez-Neila, Mingyi Zheng, Hedyeh Rafii-Tari, Raphael Sznitman

    Abstract: Semantic segmentation is a complex task that relies heavily on large amounts of annotated image data. However, annotating such data can be time-consuming and resource-intensive, especially in the medical domain. Active Learning (AL) is a popular approach that can help to reduce this burden by iteratively selecting images for annotation to improve the model performance. In the case of video data, i… ▽ More

    Submitted 11 December, 2023; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: WACV 2024, 8 pages, 7 supplementary pages

  49. arXiv:2310.13643  [pdf, other

    cs.RO

    A Review of Prospects and Opportunities in Disassembly with Human-Robot Collaboration

    Authors: Meng-Lun Lee, Xiao Liang, Boyi Hu, Gulcan Onel, Sara Behdad, Minghui Zheng

    Abstract: Product disassembly plays a crucial role in the recycling, remanufacturing, and reuse of end-of-use (EoU) products. However, the current manual disassembly process is inefficient due to the complexity and variation of EoU products. While fully automating disassembly is not economically viable given the intricate nature of the task, there is potential in using human-robot collaboration (HRC) to enh… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  50. Notes on Small Private Key Attacks on Common Prime RSA

    Authors: Mengce Zheng

    Abstract: We point out critical deficiencies in lattice-based cryptanalysis of common prime RSA presented in ``Remarks on the cryptanalysis of common prime RSA for IoT constrained low power devices'' [Information Sciences, 538 (2020) 54--68]. To rectify these flaws, we carefully scrutinize the relevant parameters involved in the analysis during solving a specific trivariate integer polynomial equation. Addi… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 15 pages, 1 figure

    MSC Class: 94A60