Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 2,376 results for author: chen, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20445  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation

    Authors: Junda Wu, Zachary Novack, Amit Namburi, Jiaheng Dai, Hao-Wen Dong, Zhouhang Xie, Carol Chen, Julian McAuley

    Abstract: Existing music captioning methods are limited to generating concise global descriptions of short music clips, which fail to capture fine-grained musical characteristics and time-aware musical changes. To address these limitations, we propose FUTGA, a model equipped with fined-grained music understanding capabilities through learning from generative augmentation with temporal compositions. We lever… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 6 pages

  2. arXiv:2407.20372  [pdf, other

    cs.CV

    A Model Generalization Study in Localizing Indoor Cows with COw LOcalization (COLO) dataset

    Authors: Mautushi Das, Gonzalo Ferreira, C. P. James Chen

    Abstract: Precision livestock farming (PLF) increasingly relies on advanced object localization techniques to monitor livestock health and optimize resource management. This study investigates the generalization capabilities of YOLOv8 and YOLOv9 models for cow detection in indoor free-stall barn settings, focusing on varying training data characteristics such as view angles and lighting, and model complexit… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 17 pages, 7 figures

    MSC Class: C.4; E.0

  3. arXiv:2407.20224  [pdf, other

    cs.CL

    Can Editing LLMs Inject Harm?

    Authors: Canyu Chen, Baixiang Huang, Zekun Li, Zhaorun Chen, Shiyang Lai, Xiongxiao Xu, Jia-Chen Gu, Jindong Gu, Huaxiu Yao, Chaowei Xiao, Xifeng Yan, William Yang Wang, Philip Torr, Dawn Song, Kai Shu

    Abstract: Knowledge editing techniques have been increasingly adopted to efficiently correct the false or outdated knowledge in Large Language Models (LLMs), due to the high cost of retraining from scratch. Meanwhile, one critical but under-explored question is: can knowledge editing be used to inject harm into LLMs? In this paper, we propose to reformulate knowledge editing as a new type of safety threat f… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally. 9 pages for main paper, 36 pages including appendix. The code, results, dataset for this paper and more resources are on the project website: https://llm-editing.github.io

  4. arXiv:2407.19453  [pdf, other

    cs.CV

    FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models

    Authors: Changgu Chen, Libing Yang, Xiaoyan Yang, Lianggangxu Chen, Gaoqi He, CHangbo Wang, Yang Li

    Abstract: In recent years, large-scale pre-trained diffusion models have demonstrated their outstanding capabilities in image and video generation tasks. However, existing models tend to produce visual objects commonly found in the training dataset, which diverges from user input prompts. The underlying reason behind the inaccurate generated results lies in the model's difficulty in sampling from specific i… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  5. ClickDiff: Click to Induce Semantic Contact Map for Controllable Grasp Generation with Diffusion Models

    Authors: Peiming Li, Ziyi Wang, Mengyuan Liu, Hong Liu, Chen Chen

    Abstract: Grasp generation aims to create complex hand-object interactions with a specified object. While traditional approaches for hand generation have primarily focused on visibility and diversity under scene constraints, they tend to overlook the fine-grained hand-object interactions such as contacts, resulting in inaccurate and undesired grasps. To address these challenges, we propose a controllable gr… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: ACM Multimedia 2024

  6. arXiv:2407.19185  [pdf, other

    cs.CV cs.AI

    LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models

    Authors: Ruiyi Zhang, Yufan Zhou, Jian Chen, Jiuxiang Gu, Changyou Chen, Tong Sun

    Abstract: Large multimodal language models have demonstrated impressive capabilities in understanding and manipulating images. However, many of these models struggle with comprehending intensive textual contents embedded within the images, primarily due to the limited text recognition and layout understanding ability. To understand the sources of these limitations, we perform an exploratory analysis showing… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: NeurIPS 2024 Under Review

  7. arXiv:2407.18170  [pdf, other

    cs.LG

    RIDA: A Robust Attack Framework on Incomplete Graphs

    Authors: Jianke Yu, Hanchen Wang, Chen Chen, Xiaoyang Wang, Wenjie Zhang, Ying Zhang

    Abstract: Graph Neural Networks (GNNs) are vital in data science but are increasingly susceptible to adversarial attacks. To help researchers develop more robust GNN models, it's essential to focus on designing strong attack models as foundational benchmarks and guiding references. Among adversarial attacks, gray-box poisoning attacks are noteworthy due to their effectiveness and fewer constraints. These at… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  8. arXiv:2407.17786  [pdf, other

    cs.CV cs.GR

    Topology-Preserving Downsampling of Binary Images

    Authors: Chia-Chia Chen, Chi-Han Peng

    Abstract: We present a novel discrete optimization-based approach to generate downsampled versions of binary images that are guaranteed to have the same topology as the original, measured by the zeroth and first Betti numbers of the black regions, while having good similarity to the original image as measured by IoU and Dice scores. To our best knowledge, all existing binary image downsampling methods do no… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted to The 18th European Conference on Computer Vision (ECCV) 2024

  9. arXiv:2407.17622  [pdf, other

    cs.LG cs.CY

    Towards Neural Network based Cognitive Models of Dynamic Decision-Making by Humans

    Authors: Changyu Chen, Shashank Reddy Chirra, Maria José Ferreira, Cleotilde Gonzalez, Arunesh Sinha, Pradeep Varakantham

    Abstract: Modelling human cognitive processes in dynamic decision-making tasks has been an endeavor in AI for a long time. Some initial works have attempted to utilize neural networks (and large language models) but often assume one common model for all humans and aim to emulate human behavior in aggregate. However, behavior of each human is distinct, heterogeneous and relies on specific past experiences in… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  10. arXiv:2407.17571  [pdf, other

    cs.CV

    Diffusion Models for Multi-Task Generative Modeling

    Authors: Changyou Chen, Han Ding, Bunyamin Sisman, Yi Xu, Ouye Xie, Benjamin Z. Yao, Son Dinh Tran, Belinda Zeng

    Abstract: Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of multi-modal generative training for more generalizable modeling? In this paper, we propose a principled way to define a diffusion model by constructing a unifi… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Published as a conference paper at ICLR 2024

  11. arXiv:2407.17485  [pdf, other

    physics.chem-ph cs.ET physics.data-an

    Application of the Digital Annealer Unit in Optimizing Chemical Reaction Conditions for Enhanced Production Yields

    Authors: Shih-Cheng Li, Pei-Hwa Wang, Jheng-Wei Su, Wei-Yin Chiang, Shih-Hsien Huang, Yen-Chu Lin, Chia-Ho Ou, Chih-Yu Chen

    Abstract: Finding appropriate reaction conditions that yield high product rates in chemical synthesis is crucial for the chemical and pharmaceutical industries. However, due to the vast chemical space, conducting experiments for each possible reaction condition is impractical. Consequently, models such as QSAR (Quantitative Structure-Activity Relationship) or ML (Machine Learning) have been developed to pre… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  12. arXiv:2407.17035  [pdf, other

    cs.CV

    Q-Ground: Image Quality Grounding with Large Multi-modality Models

    Authors: Chaofeng Chen, Sensen Yang, Haoning Wu, Liang Liao, Zicheng Zhang, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

    Abstract: Recent advances of large multi-modality models (LMM) have greatly improved the ability of image quality assessment (IQA) method to evaluate and explain the quality of visual content. However, these advancements are mostly focused on overall quality assessment, and the detailed examination of local quality, which is crucial for comprehensive visual understanding, is still largely unexplored. In thi… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: ACM Multimedia 2024 (Oral)

  13. arXiv:2407.16959  [pdf, other

    cs.LG

    Dynamic Graph Transformer with Correlated Spatial-Temporal Positional Encoding

    Authors: Zhe Wang, Sheng Zhou, Jiawei Chen, Zhen Zhang, Binbin Hu, Yan Feng, Chun Chen, Can Wang

    Abstract: Learning effective representations for Continuous-Time Dynamic Graphs (CTDGs) has garnered significant research interest, largely due to its powerful capabilities in modeling complex interactions between nodes. A fundamental and crucial requirement for representation learning in CTDGs is the appropriate estimation and preservation of proximity. However, due to the sparse and evolving characteristi… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  14. arXiv:2407.16725  [pdf, other

    cs.CV

    Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions

    Authors: Kai Liu, Zhihang Fu, Chao Chen, Sheng Jin, Ze Chen, Mingyuan Tao, Rongxin Jiang, Jieping Ye

    Abstract: The key to OOD detection has two aspects: generalized feature representation and precise category description. Recently, vision-language models such as CLIP provide significant advances in both two issues, but constructing precise category descriptions is still in its infancy due to the absence of unseen categories. This work introduces two hierarchical contexts, namely perceptual context and spur… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  15. arXiv:2407.16560  [pdf, other

    cs.CV cs.DC

    COALA: A Practical and Vision-Centric Federated Learning Platform

    Authors: Weiming Zhuang, Jian Xu, Chen Chen, Jingtao Li, Lingjuan Lyu

    Abstract: We present COALA, a vision-centric Federated Learning (FL) platform, and a suite of benchmarks for practical FL scenarios, which we categorize into three levels: task, data, and model. At the task level, COALA extends support from simple classification to 15 computer vision tasks, including object detection, segmentation, pose estimation, and more. It also facilitates federated multiple-task learn… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: ICML'24

  16. arXiv:2407.16434  [pdf, other

    cs.CL

    Enhancing LLM's Cognition via Structurization

    Authors: Kai Liu, Zhihang Fu, Chao Chen, Wei Zhang, Rongxin Jiang, Fan Zhou, Yaowu Chen, Yue Wu, Jieping Ye

    Abstract: When reading long-form text, human cognition is complex and structurized. While large language models (LLMs) process input contexts through a causal and sequential perspective, this approach can potentially limit their ability to handle intricate and complex inputs effectively. To enhance LLM's cognition capability, this paper presents a novel concept of context structurization. Specifically, we t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: N/A

  17. arXiv:2407.16430  [pdf, other

    cs.CV

    Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution

    Authors: Kai Liu, Zhihang Fu, Sheng Jin, Chao Chen, Ze Chen, Rongxin Jiang, Fan Zhou, Yaowu Chen, Jieping Ye

    Abstract: Detecting and rejecting unknown out-of-distribution (OOD) samples is critical for deployed neural networks to void unreliable predictions. In real-world scenarios, however, the efficacy of existing OOD detection methods is often impeded by the inherent imbalance of in-distribution (ID) data, which causes significant performance decline. Through statistical observations, we have identified two comm… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: N/A

  18. arXiv:2407.15894  [pdf, other

    cs.CV

    Craft: Cross-modal Aligned Features Improve Robustness of Prompt Tuning

    Authors: Jingchen Sun, Rohan Sharma, Vishnu Suresh Lokhande, Changyou Chen

    Abstract: Prompt Tuning has emerged as a prominent research paradigm for adapting vision-language models to various downstream tasks. However, recent research indicates that prompt tuning methods often lead to overfitting due to limited training samples. In this paper, we propose a Cross-modal Aligned Feature Tuning (Craft) method to address this issue. Cross-modal alignment is conducted by first selecting… ▽ More

    Submitted 23 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: 15pages

  19. arXiv:2407.15880  [pdf, other

    cs.LG cs.AI q-bio.QM

    Diff4VS: HIV-inhibiting Molecules Generation with Classifier Guidance Diffusion for Virtual Screening

    Authors: Jiaqing Lyu, Changjie Chen, Bing Liang, Yijia Zhang

    Abstract: The AIDS epidemic has killed 40 million people and caused serious global problems. The identification of new HIV-inhibiting molecules is of great importance for combating the AIDS epidemic. Here, the Classifier Guidance Diffusion model and ligand-based virtual screening strategy are combined to discover potential HIV-inhibiting molecules for the first time. We call it Diff4VS. An extra classifier… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  20. arXiv:2407.15862  [pdf

    cs.LG cs.AI cs.CL cs.CY

    Performance Evaluation of Lightweight Open-source Large Language Models in Pediatric Consultations: A Comparative Analysis

    Authors: Qiuhong Wei, Ying Cui, Mengwei Ding, Yanqin Wang, Lingling Xiang, Zhengxiong Yao, Ceran Chen, Ying Long, Zhezhen Jin, Ximing Xu

    Abstract: Large language models (LLMs) have demonstrated potential applications in medicine, yet data privacy and computational burden limit their deployment in healthcare institutions. Open-source and lightweight versions of LLMs emerge as potential solutions, but their performance, particularly in pediatric settings remains underexplored. In this cross-sectional study, 250 patient consultation questions w… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 27 pages in total with 17 pages of main manuscript and 10 pages of supplementary materials; 4 figures in the main manuscript and 2 figures in supplementary material

    MSC Class: 68M20 (Primary) 62G10 (Secondary)

  21. arXiv:2407.15842  [pdf, other

    cs.CV cs.GR

    Artist: Aesthetically Controllable Text-Driven Stylization without Training

    Authors: Ruixiang Jiang, Changwen Chen

    Abstract: Diffusion models entangle content and style generation during the denoising process, leading to undesired content modification when directly applied to stylization tasks. Existing methods struggle to effectively control the diffusion model to meet the aesthetic-level requirements for stylization. In this paper, we introduce \textbf{Artist}, a training-free approach that aesthetically controls the… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: WIP,webpage: https://DiffusionArtist.github.io

  22. arXiv:2407.15706  [pdf, other

    cs.CV

    Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition

    Authors: Jinfu Liu, Chen Chen, Mengyuan Liu

    Abstract: Skeleton-based action recognition has garnered significant attention due to the utilization of concise and resilient skeletons. Nevertheless, the absence of detailed body information in skeletons restricts performance, while other multimodal methods require substantial inference resources and are inefficient when using multimodal data during both training and inference stages. To address this and… ▽ More

    Submitted 29 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

  23. arXiv:2407.15642  [pdf, other

    cs.CV

    Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

    Authors: Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Yuan-Fang Li, Cunjian Chen, Yu Qiao

    Abstract: Diffusion models have achieved great progress in image animation due to powerful generative capabilities. However, maintaining spatio-temporal consistency with detailed information from the input static image over time (e.g., style, background, and object of the input static image) and ensuring smoothness in animated video narratives guided by textual prompts still remains challenging. In this pap… ▽ More

    Submitted 22 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: Project webpage: https://maxin-cn.github.io/cinemo_project/

  24. arXiv:2407.15488  [pdf, other

    cs.CV

    DiffX: Guide Your Layout to Cross-Modal Generative Modeling

    Authors: Zeyu Wang, Jingyu Lin, Yifei Qian, Yi Huang, Shicen Tian, Bosong Chai, Juncan Deng, Lan Du, Cunjian Chen, Yufei Guo, Kejie Huang

    Abstract: Diffusion models have made significant strides in language-driven and layout-driven image generation. However, most diffusion models are limited to visible RGB image generation. In fact, human perception of the world is enriched by diverse viewpoints, including chromatic contrast, thermal illumination, and depth information. In this paper, we introduce a novel diffusion model for general layout-gu… ▽ More

    Submitted 28 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

  25. Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval

    Authors: Yiyang Jiang, Wengyu Zhang, Xulu Zhang, Xiaoyong Wei, Chang Wen Chen, Qing Li

    Abstract: In this paper, we investigate the feasibility of leveraging large language models (LLMs) for integrating general knowledge and incorporating pseudo-events as priors for temporal content distribution in video moment retrieval (VMR) models. The motivation behind this study arises from the limitations of using LLMs as decoders for generating discrete textual descriptions, which hinders their direct a… ▽ More

    Submitted 22 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted to ACM Multimedia 2024

  26. arXiv:2407.14568  [pdf, other

    cs.CL cs.AI cs.DB

    SQLfuse: Enhancing Text-to-SQL Performance through Comprehensive LLM Synergy

    Authors: Tingkai Zhang, Chaoyu Chen, Cong Liao, Jun Wang, Xudong Zhao, Hang Yu, Jianchao Wang, Jianguo Li, Wenhui Shi

    Abstract: Text-to-SQL conversion is a critical innovation, simplifying the transition from complex SQL to intuitive natural language queries, especially significant given SQL's prevalence in the job market across various roles. The rise of Large Language Models (LLMs) like GPT-3.5 and GPT-4 has greatly advanced this field, offering improved natural language understanding and the ability to generate nuanced… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  27. arXiv:2407.14414  [pdf, other

    cs.AI cs.CL cs.LG

    System-1.x: Learning to Balance Fast and Slow Planning with Language Models

    Authors: Swarnadeep Saha, Archiki Prasad, Justin Chih-Yao Chen, Peter Hase, Elias Stengel-Eskin, Mohit Bansal

    Abstract: Language models can be used to solve long-horizon planning problems in two distinct modes: a fast 'System-1' mode, directly generating plans without any explicit search or backtracking, and a slow 'System-2' mode, planning step-by-step by explicitly searching over possible actions. While System-2 is typically more effective, it is also more computationally expensive, making it infeasible for long… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 29 pages (10 tables)

  28. arXiv:2407.14081  [pdf, other

    cs.LG cs.AI cs.IR cs.SI

    DisenSemi: Semi-supervised Graph Classification via Disentangled Representation Learning

    Authors: Yifan Wang, Xiao Luo, Chong Chen, Xian-Sheng Hua, Ming Zhang, Wei Ju

    Abstract: Graph classification is a critical task in numerous multimedia applications, where graphs are employed to represent diverse types of multimedia data, including images, videos, and social networks. Nevertheless, in real-world scenarios, labeled graph data can be limited or scarce. To address this issue, we focus on the problem of semi-supervised graph classification, which involves both supervised… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems (TNNLS 2024)

  29. Identifying Smart Contract Security Issues in Code Snippets from Stack Overflow

    Authors: Jiachi Chen, Chong Chen, Jiang Hu, John Grundy, Yanlin Wang, Ting Chen, Zibin Zheng

    Abstract: Smart contract developers frequently seek solutions to developmental challenges on Q&A platforms such as Stack Overflow (SO). Although community responses often provide viable solutions, the embedded code snippets can also contain hidden vulnerabilities. Integrating such code directly into smart contracts may make them susceptible to malicious attacks. We conducted an online survey and received 74… ▽ More

    Submitted 23 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  30. arXiv:2407.13193  [pdf, other

    cs.CL

    Retrieval-Augmented Generation for Natural Language Processing: A Survey

    Authors: Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue

    Abstract: Large language models (LLMs) have demonstrated great success in various fields, benefiting from their huge amount of parameters that store knowledge. However, LLMs still suffer from several key issues, such as hallucination problems, knowledge update issues, and lacking domain-specific expertise. The appearance of retrieval-augmented generation (RAG), which leverages an external knowledge database… ▽ More

    Submitted 18 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  31. arXiv:2407.12939  [pdf, other

    cs.CV

    GenRC: Generative 3D Room Completion from Sparse Image Collections

    Authors: Ming-Feng Li, Yueh-Feng Ku, Hong-Xuan Yen, Chi Liu, Yu-Lun Liu, Albert Y. C. Chen, Cheng-Hao Kuo, Min Sun

    Abstract: Sparse RGBD scene completion is a challenging task especially when considering consistent textures and geometries throughout the entire scene. Different from existing solutions that rely on human-designed text prompts or predefined camera trajectories, we propose GenRC, an automated training-free pipeline to complete a room-scale 3D mesh with high-fidelity textures. To achieve this, we first proje… ▽ More

    Submitted 18 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  32. arXiv:2407.12810  [pdf

    cs.NI

    A Study on the Situation of Connected Car Patent Portfolios

    Authors: Abel C. H. Chen, Chia-Shen Chang

    Abstract: In recent years, the countries of the world have drafted the specifications of connected cars; for instance, the Security Credential Management System (SCMS) has been proposed by United States Department of Transportation (USDOT), and the Cooperative Intelligent Transportation System (C-ITS) Credential Management System (CCMS) has been proposed by European Union (EU). Therefore, several companies… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: in Chinese language

  33. arXiv:2407.12701  [pdf, other

    cs.CR

    Efficient and Flexible Differet-Radix Montgomery Modular Multiplication for Hardware Implementation

    Authors: Yuxuan Zhang, Hua Guo, Chen Chen, Yewei Guan, Xiyong Zhang, Zhenyu Guan

    Abstract: Montgomery modular multiplication is widely-used in public key cryptosystems (PKC) and affects the efficiency of upper systems directly. However, modulus is getting larger due to the increasing demand of security, which results in a heavy computing cost. High-performance implementation of Montgomery modular multiplication is urgently required to ensure the highly-efficient operations in PKC. Howev… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  34. arXiv:2407.12442  [pdf, other

    cs.CV

    ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

    Authors: Mengcheng Lan, Chaofeng Chen, Yiping Ke, Xinjiang Wang, Litong Feng, Wayne Zhang

    Abstract: Despite the success of large-scale pretrained Vision-Language Models (VLMs) especially CLIP in various open-vocabulary tasks, their application to semantic segmentation remains challenging, producing noisy segmentation maps with mis-segmented regions. In this paper, we carefully re-investigate the architecture of CLIP, and identify residual connections as the primary source of noise that degrades… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. code available at https://github.com/mc- lan/ClearCLIP

  35. arXiv:2407.12322  [pdf, other

    cs.CV

    Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer

    Authors: Wenhan Wu, Ce Zheng, Zihao Yang, Chen Chen, Srijan Das, Aidong Lu

    Abstract: Recently, transformers have demonstrated great potential for modeling long-term dependencies from skeleton sequences and thereby gained ever-increasing attention in skeleton action recognition. However, the existing transformer-based approaches heavily rely on the naive attention mechanism for capturing the spatiotemporal features, which falls short in learning discriminative representations that… ▽ More

    Submitted 29 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM Multimedia 2024

  36. arXiv:2407.12176  [pdf, other

    cs.CY cs.AI cs.CL

    GPT-4V Cannot Generate Radiology Reports Yet

    Authors: Yuyang Jiang, Chacha Chen, Dang Nguyen, Benjamin M. Mervak, Chenhao Tan

    Abstract: GPT-4V's purported strong multimodal abilities raise interests in using it to automate radiology report writing, but there lacks thorough evaluations. In this work, we perform a systematic evaluation of GPT-4V in generating radiology reports on two chest X-ray report datasets: MIMIC-CXR and IU X-Ray. We attempt to directly generate reports using GPT-4V through different prompting strategies and fi… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 24 pages, 3 figures, code: https://github.com/YuyangJ0/GPT-4V-evaluation-radiology-report

  37. arXiv:2407.12112  [pdf, other

    cs.LG cs.CY cs.SI

    A Benchmark for Fairness-Aware Graph Learning

    Authors: Yushun Dong, Song Wang, Zhenyu Lei, Zaiyi Zheng, Jing Ma, Chen Chen, Jundong Li

    Abstract: Fairness-aware graph learning has gained increasing attention in recent years. Nevertheless, there lacks a comprehensive benchmark to evaluate and compare different fairness-aware graph learning methods, which blocks practitioners from choosing appropriate ones for broader real-world applications. In this paper, we present an extensive benchmark on ten representative fairness-aware graph learning… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  38. arXiv:2407.11073  [pdf, other

    cs.CR cs.CV cs.LG

    SemiAdv: Query-Efficient Black-Box Adversarial Attack with Unlabeled Images

    Authors: Mingyuan Fan, Yang Liu, Cen Chen, Ximeng Liu

    Abstract: Adversarial attack has garnered considerable attention due to its profound implications for the secure deployment of robots in sensitive security scenarios. To potentially push for advances in the field, this paper studies the adversarial attack in the black-box setting and proposes an unlabeled data-driven adversarial attack method, called SemiAdv. Specifically, SemiAdv achieves the following bre… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  39. arXiv:2407.10180  [pdf, other

    cs.CV

    Defending Against Repetitive-based Backdoor Attacks on Semi-supervised Learning through Lens of Rate-Distortion-Perception Trade-off

    Authors: Cheng-Yi Lee, Ching-Chia Kao, Cheng-Han Yeh, Chun-Shien Lu, Chia-Mu Yu, Chu-Song Chen

    Abstract: Semi-supervised learning (SSL) has achieved remarkable performance with a small fraction of labeled data by leveraging vast amounts of unlabeled data from the Internet. However, this large pool of untrusted data is extremely vulnerable to data poisoning, leading to potential backdoor attacks. Current backdoor defenses are not yet effective against such a vulnerability in SSL. In this study, we pro… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: under review

  40. arXiv:2407.10102  [pdf, other

    cs.CV

    3DEgo: 3D Editing on the Go!

    Authors: Umar Khalid, Hasan Iqbal, Azib Farooq, Jing Hua, Chen Chen

    Abstract: We introduce 3DEgo to address a novel problem of directly synthesizing photorealistic 3D scenes from monocular videos guided by textual prompts. Conventional methods construct a text-conditioned 3D scene through a three-stage process, involving pose estimation using Structure-from-Motion (SfM) libraries like COLMAP, initializing the 3D model with unedited images, and iteratively updating the datas… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 Accepted Paper

  41. arXiv:2407.09935  [pdf, other

    cs.CV cs.MM eess.IV

    LeRF: Learning Resampling Function for Adaptive and Efficient Image Interpolation

    Authors: Jiacheng Li, Chang Chen, Fenglong Song, Youliang Yan, Zhiwei Xiong

    Abstract: Image resampling is a basic technique that is widely employed in daily applications, such as camera photo editing. Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors. Still, these methods are not the perfect substitute for interpolation, due to the drawbacks in efficiency and versatility. In this work, we propose a novel method of Lea… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/ddlee-cn/LeRF-PyTorch

  42. Performance Comparison of Various Modes of Advanced Encryption Standard

    Authors: Abel C. H. Chen

    Abstract: With the maturation of quantum computing technology, many cryptographic methods are gradually facing threats from quantum computing. Although the Grover algorithm can accelerate search speeds, current research indicates that the Advanced Encryption Standard (AES) method can still enhance security by increasing the length of the secret key. However, the AES method involves multiple modes in impleme… ▽ More

    Submitted 21 May, 2024; originally announced July 2024.

    Comments: in Chinese language

  43. arXiv:2407.09019  [pdf, other

    cs.SI cs.AI

    Heterogeneous Subgraph Network with Prompt Learning for Interpretable Depression Detection on Social Media

    Authors: Chen Chen, Mingwei Li, Fenghuan Li, Haopeng Chen, Yuankun Lin

    Abstract: Massive social media data can reflect people's authentic thoughts, emotions, communication, etc., and therefore can be analyzed for early detection of mental health problems such as depression. Existing works about early depression detection on social media lacked interpretability and neglected the heterogeneity of social media data. Furthermore, they overlooked the global interaction among users.… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  44. arXiv:2407.09018  [pdf, other

    cs.SE

    AUITestAgent: Automatic Requirements Oriented GUI Function Testing

    Authors: Yongxiang Hu, Xuan Wang, Yingchuan Wang, Yu Zhang, Shiyu Guo, Chaoyi Chen, Xin Wang, Yangfan Zhou

    Abstract: The Graphical User Interface (GUI) is how users interact with mobile apps. To ensure it functions properly, testing engineers have to make sure it functions as intended, based on test requirements that are typically written in natural language. While widely adopted manual testing and script-based methods are effective, they demand substantial effort due to the vast number of GUI pages and rapid it… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  45. arXiv:2407.08850  [pdf, other

    cs.HC cs.AI

    UICrit: Enhancing Automated Design Evaluation with a UICritique Dataset

    Authors: Peitong Duan, Chin-yi Chen, Gang Li, Bjoern Hartmann, Yang Li

    Abstract: Automated UI evaluation can be beneficial for the design process; for example, to compare different UI designs, or conduct automated heuristic evaluation. LLM-based UI evaluation, in particular, holds the promise of generalizability to a wide variety of UI types and evaluation tasks. However, current LLM-based techniques do not yet match the performance of human evaluators. We hypothesize that aut… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  46. arXiv:2407.08713  [pdf, other

    cs.CL cs.AI

    GTA: A Benchmark for General Tool Agents

    Authors: Jize Wang, Zerun Ma, Yining Li, Songyang Zhang, Cailian Chen, Kai Chen, Xinyi Le

    Abstract: Significant focus has been placed on integrating large language models (LLMs) with various tools in developing general-purpose agents. This poses a challenge to LLMs' tool-use capabilities. However, there are evident gaps between existing tool-use evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only interactions, fa… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Github repo: https://github.com/open-compass/GTA

  47. arXiv:2407.08134  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Highway Networks for Improved Surface Reconstruction: The Role of Residuals and Weight Updates

    Authors: A. Noorizadegan, Y. C. Hon, D. L. Young, C. S. Chen

    Abstract: Surface reconstruction from point clouds is a fundamental challenge in computer graphics and medical imaging. In this paper, we explore the application of advanced neural network architectures for the accurate and efficient reconstruction of surfaces from data points. We introduce a novel variant of the Highway network (Hw) called Square-Highway (SqrHw) within the context of multilayer perceptrons… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  48. arXiv:2407.07375  [pdf, ps, other

    cs.AI math.NA

    Stable Weight Updating: A Key to Reliable PDE Solutions Using Deep Learning

    Authors: A. Noorizadegan, R. Cavoretto, D. L. Young, C. S. Chen

    Abstract: Background: Deep learning techniques, particularly neural networks, have revolutionized computational physics, offering powerful tools for solving complex partial differential equations (PDEs). However, ensuring stability and efficiency remains a challenge, especially in scenarios involving nonlinear and time-dependent equations. Methodology: This paper introduces novel residual-based architecture… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  49. arXiv:2407.07347  [pdf, other

    cs.CV eess.IV

    MNeRV: A Multilayer Neural Representation for Videos

    Authors: Qingling Chang, Haohui Yu, Shuxuan Fu, Zhiqiang Zeng, Chuangquan Chen

    Abstract: As a novel video representation method, Neural Representations for Videos (NeRV) has shown great potential in the fields of video compression, video restoration, and video interpolation. In the process of representing videos using NeRV, each frame corresponds to an embedding, which is then reconstructed into a video frame sequence after passing through a small number of decoding layers (E-NeRV, HN… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 14 pages, 12 figures, 8 table

  50. arXiv:2407.06770  [pdf, other

    cs.RO

    Pretraining-finetuning Framework for Efficient Co-design: A Case Study on Quadruped Robot Parkour

    Authors: Ci Chen, Jiyu Yu, Haojian Lu, Hongbo Gao, Rong Xiong, Yue Wang

    Abstract: In nature, animals with exceptional locomotion abilities, such as cougars, often possess asymmetric fore and hind legs, with their powerful hind legs acting as reservoirs of energy for leaps. This observation inspired us: could optimize the leg length of quadruped robots endow them with similar locomotive capabilities? In this paper, we propose an approach that co-optimizes the mechanical structur… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.