Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 3,291 results for author: Wang, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03568  [pdf, other

    cs.CR

    Enabling Practical and Privacy-Preserving Image Processing

    Authors: Chao Wang, Shubing Yang, Xiaoyan Sun, Jun Dai, Dongfang Zhao

    Abstract: Fully Homomorphic Encryption (FHE) enables computations on encrypted data, preserving confidentiality without the need for decryption. However, FHE is often hindered by significant performance overhead, particularly for high-precision and complex data like images. Due to serious efficiency issues, traditional FHE methods often encrypt images by monolithic data blocks (such as pixel rows), instead… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 16 pages, 10 figures

    ACM Class: C.2.0; K.6.5

  2. arXiv:2409.03363  [pdf, other

    cs.CL

    Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding

    Authors: Cheng Wang, Yiwei Wang, Bryan Hooi, Yujun Cai, Nanyun Peng, Kai-Wei Chang

    Abstract: The training data in large language models is key to their success, but it also presents privacy and security risks, as it may contain sensitive information. Detecting pre-training data is crucial for mitigating these concerns. Existing methods typically analyze target text in isolation or solely with non-member contexts, overlooking potential insights from simultaneously considering both member a… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  3. arXiv:2409.03270  [pdf, other

    cs.CV

    SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model

    Authors: Weipeng Tan, Chuming Lin, Chengming Xu, Xiaozhong Ji, Junwei Zhu, Chengjie Wang, Yanwei Fu

    Abstract: Talking Head Generation (THG), typically driven by audio, is an important and challenging task with broad application prospects in various fields such as digital humans, film production, and virtual reality. While diffusion model-based THG methods present high quality and stable content generation, they often overlook the intrinsic style which encompasses personalized features such as speaking hab… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  4. arXiv:2409.03223  [pdf, other

    cs.CV

    Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion

    Authors: Chenguang Zhu, Shan Gao, Huafeng Chen, Guangqian Guo, Chaowei Wang, Yaoxing Wang, Chen Shu Lei, Quanjiang Fan

    Abstract: Multi-modality image fusion aims to integrate the merits of images from different sources and render high-quality fusion images. However, existing feature extraction and fusion methods are either constrained by inherent local reduction bias and static parameters during inference (CNN) or limited by quadratic computational complexity (Transformers), and cannot effectively extract and fuse features.… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  5. arXiv:2409.03220  [pdf, ps, other

    cs.LG cs.SE

    FairQuant: Certifying and Quantifying Fairness of Deep Neural Networks

    Authors: Brian Hyeongseok Kim, Jingbo Wang, Chao Wang

    Abstract: We propose a method for formally certifying and quantifying individual fairness of deep neural networks (DNN). Individual fairness guarantees that any two individuals who are identical except for a legally protected attribute (e.g., gender or race) receive the same treatment. While there are existing techniques that provide such a guarantee, they tend to suffer from lack of scalability or accuracy… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: To Appear In Proceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE 2025)

  6. arXiv:2409.02648  [pdf, other

    cond-mat.mtrl-sci cs.CV

    Creating a Microstructure Latent Space with Rich Material Information for Multiphase Alloy Design

    Authors: Xudong Ma, Yuqi Zhang, Chenchong Wang, Ming Wang, Mingxin Huang, Wei Xu

    Abstract: The intricate microstructure serves as the cornerstone for the composition/processing-structure-property (CPSP) connection in multiphase alloys. Traditional alloy design methods often overlook microstructural details, which diminishes the reliability and effectiveness of the outcomes. This study introduces an improved alloy design algorithm that integrates authentic microstructural information to… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  7. arXiv:2409.01935  [pdf, other

    cs.CV

    Map-Assisted Remote-Sensing Image Compression at Extremely Low Bitrates

    Authors: Yixuan Ye, Ce Wang, Wanjie Sun, Zhenzhong Chen

    Abstract: Remote-sensing (RS) image compression at extremely low bitrates has always been a challenging task in practical scenarios like edge device storage and narrow bandwidth transmission. Generative models including VAEs and GANs have been explored to compress RS images into extremely low-bitrate streams. However, these generative models struggle to reconstruct visually plausible images due to the highl… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  8. arXiv:2409.01691  [pdf, other

    cs.CV

    When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels

    Authors: Yifan Liu, Wuyang Li, Cheng Wang, Hui Chen, Yixuan Yuan

    Abstract: Tooth point cloud segmentation is a fundamental task in many orthodontic applications. Current research mainly focuses on fully supervised learning which demands expensive and tedious manual point-wise annotation. Although recent weakly-supervised alternatives are proposed to use weak labels for 3D segmentation and achieve promising results, they tend to fail when the labels are extremely sparse.… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: To appear at MICCAI24

  9. Priority based inter-twin communication in vehicular digital twin networks

    Authors: Qasim Zia, Chenyu Wang, Saide Zhu, Yingshu Li

    Abstract: With the advancement and boom of autonomous vehicles, vehicular digital twins (VDTs) have become an emerging research area. VDT can solve the issues related to autonomous vehicles and provide improved and enhanced services to users. Recent studies have demonstrated the potential of using priorities in acquiring improved response time. However, since VDT is comprised of intra-twin and inter-twin co… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: This is an Accepted Manuscript of an article published by Taylor & Francis Group in the International Journal of Parallel, Emergent & Distributed Systems on 02 Sep 2024

  10. arXiv:2409.01662  [pdf, other

    cs.CV

    Efficiently Expanding Receptive Fields: Local Split Attention and Parallel Aggregation for Enhanced Large-scale Point Cloud Semantic Segmentation

    Authors: Haodong Wang, Chongyu Wang, Yinghui Quan, Di Wang

    Abstract: Expanding the receptive field in a deep learning model for large-scale 3D point cloud segmentation is an effective technique for capturing rich contextual information, which consequently enhances the network's ability to learn meaningful features. However, this often leads to increased computational complexity and risk of overfitting, challenging the efficiency and effectiveness of the learning pa… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  11. arXiv:2409.01652  [pdf, other

    cs.RO cs.AI cs.CV

    ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation

    Authors: Wenlong Huang, Chen Wang, Yunzhu Li, Ruohan Zhang, Li Fei-Fei

    Abstract: Representing robotic manipulation tasks as constraints that associate the robot and the environment is a promising way to encode desired robot behaviors. However, it remains unclear how to formulate the constraints such that they are 1) versatile to diverse tasks, 2) free of manual labeling, and 3) optimizable by off-the-shelf solvers to produce robot actions in real-time. In this work, we introdu… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  12. arXiv:2409.01545  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation

    Authors: Chien-Chun Wang, Li-Wei Chen, Hung-Shin Lee, Berlin Chen, Hsin-Min Wang

    Abstract: Cross-domain speech enhancement (SE) is often faced with severe challenges due to the scarcity of noise and background information in an unseen target domain, leading to a mismatch between training and test conditions. This study puts forward a novel data simulation method to address this issue, leveraging noise-extractive techniques and generative adversarial networks (GANs) with only limited tar… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted to IEEE SLT 2024

  13. arXiv:2409.01347  [pdf, other

    cs.CV

    Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance

    Authors: Cunzheng Wang, Ziyuan Guo, Yuxuan Duan, Huaxia Li, Nemo Chen, Xu Tang, Yao Hu

    Abstract: Consistency distillation methods have demonstrated significant success in accelerating generative tasks of diffusion models. However, since previous consistency distillation methods use simple and straightforward strategies in selecting target timesteps, they usually struggle with blurs and detail losses in generated images. To address these limitations, we introduce Target-Driven Distillation (TD… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  14. arXiv:2409.01256  [pdf, other

    cs.CV cs.AI

    Real-time Accident Anticipation for Autonomous Driving Through Monocular Depth-Enhanced 3D Modeling

    Authors: Haicheng Liao, Yongkang Li, Chengyue Wang, Songning Lai, Zhenning Li, Zilin Bian, Jaeyoung Lee, Zhiyong Cui, Guohui Zhang, Chengzhong Xu

    Abstract: The primary goal of traffic accident anticipation is to foresee potential accidents in real time using dashcam videos, a task that is pivotal for enhancing the safety and reliability of autonomous driving technologies. In this study, we introduce an innovative framework, AccNet, which significantly advances the prediction capabilities beyond the current state-of-the-art (SOTA) 2D-based methods by… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  15. arXiv:2409.00726  [pdf, other

    cs.CV cs.AI

    LPUWF-LDM: Enhanced Latent Diffusion Model for Precise Late-phase UWF-FA Generation on Limited Dataset

    Authors: Zhaojie Fang, Xiao Yu, Guanyu Zhou, Ke Zhuang, Yifei Chen, Ruiquan Ge, Changmiao Wang, Gangyong Jia, Qing Wu, Juan Ye, Maimaiti Nuliqiman, Peifang Xu, Ahmed Elazab

    Abstract: Ultra-Wide-Field Fluorescein Angiography (UWF-FA) enables precise identification of ocular diseases using sodium fluorescein, which can be potentially harmful. Existing research has developed methods to generate UWF-FA from Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) to reduce the adverse reactions associated with injections. However, these methods have been less effective in producin… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 13 pages, 7 figures

  16. arXiv:2409.00557  [pdf, other

    cs.CL cs.AI cs.SE

    Learning to Ask: When LLMs Meet Unclear Instruction

    Authors: Wenxuan Wang, Juluan Shi, Chaozheng Wang, Cheryl Lee, Youliang Yuan, Jen-tse Huang, Michael R. Lyu

    Abstract: Equipped with the capability to call functions, modern large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone. However, the effective execution of these tools relies heavily not just on the advanced capabilities of LLMs but also on precise user instructions, which often cannot be ensured in the real world. To evaluate the… ▽ More

    Submitted 4 September, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

  17. arXiv:2409.00487  [pdf, other

    cs.CV

    TrackSSM: A General Motion Predictor by State-Space Model

    Authors: Bin Hu, Run Luo, Zelin Liu, Cheng Wang, Wenyu Liu

    Abstract: Temporal motion modeling has always been a key component in multiple object tracking (MOT) which can ensure smooth trajectory movement and provide accurate positional information to enhance association precision. However, current motion models struggle to be both efficient and effective across different application scenarios. To this end, we propose TrackSSM inspired by the recently popular state… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  18. arXiv:2409.00133  [pdf, other

    cs.CL cs.AI

    A Survey for Large Language Models in Biomedicine

    Authors: Chong Wang, Mengyao Li, Junjun He, Zhongruo Wang, Erfan Darzi, Zan Chen, Jin Ye, Tianbin Li, Yanzhou Su, Jing Ke, Kaili Qu, Shuxin Li, Yi Yu, Pietro Liò, Tianyun Wang, Yu Guang Wang, Yiqing Shen

    Abstract: Recent breakthroughs in large language models (LLMs) offer unprecedented natural language understanding and generation capabilities. However, existing surveys on LLMs in biomedicine often focus on specific applications or model architectures, lacking a comprehensive analysis that integrates the latest advancements across various biomedical domains. This review, based on an analysis of 484 publicat… ▽ More

    Submitted 29 August, 2024; originally announced September 2024.

  19. arXiv:2408.17377  [pdf, other

    cs.CL cs.AI

    NDP: Next Distribution Prediction as a More Broad Target

    Authors: Junhao Ruan, Abudukeyumu Abudula, Xinyu Liu, Bei Li, Yinqiao Li, Chenglong Wang, Yuchun Fan, Yuan Ge, Tong Xiao, Jingbo Zhu

    Abstract: Large language models (LLMs) trained on next-token prediction (NTP) paradigm have demonstrated powerful capabilities. However, the existing NTP paradigm contains several limitations, particularly related to planned task complications and error propagation during inference. In our work, we extend the critique of NTP, highlighting its limitation also due to training with a narrow objective: the pred… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 8 pages,5 figures

  20. arXiv:2408.17223  [pdf, other

    cs.CV

    OG-Mapping: Octree-based Structured 3D Gaussians for Online Dense Mapping

    Authors: Meng Wang, Junyi Wang, Changqun Xia, Chen Wang, Yue Qi

    Abstract: 3D Gaussian splatting (3DGS) has recently demonstrated promising advancements in RGB-D online dense mapping. Nevertheless, existing methods excessively rely on per-pixel depth cues to perform map densification, which leads to significant redundancy and increased sensitivity to depth noise. Additionally, explicitly storing 3D Gaussian parameters of room-scale scene poses a significant storage chall… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  21. arXiv:2408.17009  [pdf, other

    cs.SD eess.AS

    Utilizing Speaker Profiles for Impersonation Audio Detection

    Authors: Hao Gu, JiangYan Yi, Chenglong Wang, Yong Ren, Jianhua Tao, Xinrui Yan, Yujie Chen, Xiaohui Zhang

    Abstract: Fake audio detection is an emerging active topic. A growing number of literatures have aimed to detect fake utterance, which are mostly generated by Text-to-speech (TTS) or voice conversion (VC). However, countermeasures against impersonation remain an underexplored area. Impersonation is a fake type that involves an imitator replicating specific traits and speech style of a target speaker. Unlike… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM2024

  22. arXiv:2408.16966  [pdf, other

    cs.LG cs.AI cs.CL

    UserSumBench: A Benchmark Framework for Evaluating User Summarization Approaches

    Authors: Chao Wang, Neo Wu, Lin Ning, Luyang Liu, Jun Xie, Shawn O'Banion, Bradley Green

    Abstract: Large language models (LLMs) have shown remarkable capabilities in generating user summaries from a long list of raw user activity data. These summaries capture essential user information such as preferences and interests, and therefore are invaluable for LLM-based personalization applications, such as explainable recommender systems. However, the development of new summarization techniques is hin… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  23. arXiv:2408.16942  [pdf, other

    cs.CL cs.AI

    A longitudinal sentiment analysis of Sinophobia during COVID-19 using large language models

    Authors: Chen Wang, Rohitash Chandra

    Abstract: The COVID-19 pandemic has exacerbated xenophobia, particularly Sinophobia, leading to widespread discrimination against individuals of Chinese descent. Large language models (LLMs) are pre-trained deep learning models used for natural language processing (NLP) tasks. The ability of LLMs to understand and generate human-like text makes them particularly useful for analysing social media data to det… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  24. arXiv:2408.16659  [pdf, other

    physics.med-ph cs.GR

    Motion-Driven Neural Optimizer for Prophylactic Braces Made by Distributed Microstructures

    Authors: Xingjian Han, Yu Jiang, Weiming Wang, Guoxin Fang, Simeon Gill, Zhiqiang Zhang, Shengfa Wang, Jun Saito, Deepak Kumar, Zhongxuan Luo, Emily Whiting, Charlie C. L. Wang

    Abstract: Joint injuries, and their long-term consequences, present a substantial global health burden. Wearable prophylactic braces are an attractive potential solution to reduce the incidence of joint injuries by limiting joint movements that are related to injury risk. Given human motion and ground reaction forces, we present a computational framework that enables the design of personalized braces by opt… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  25. arXiv:2408.16520  [pdf, other

    cs.CV

    Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment

    Authors: Liyao Tang, Zhe Chen, Shanshan Zhao, Chaoyue Wang, Dacheng Tao

    Abstract: Label-efficient segmentation aims to perform effective segmentation on input data using only sparse and limited ground-truth labels for training. This topic is widely studied in 3D point cloud segmentation due to the difficulty of annotating point clouds densely, while it is also essential for cost-effective segmentation on 2D images. Until recently, pseudo-labels have been widely employed to faci… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Extended version of arXiv:2305.15832; Code at https://github.com/LiyaoTang/ERDA

  26. arXiv:2408.16343  [pdf, other

    cs.CV cs.AI

    Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach

    Authors: Yifei Chen, Shenghao Zhu, Zhaojie Fang, Chang Liu, Binfeng Zou, Yuhe Wang, Shuo Chang, Fan Jia, Feiwei Qin, Jin Fan, Yong Peng, Changmiao Wang

    Abstract: Alzheimer's Disease (AD) is a complex neurodegenerative disorder marked by memory loss, executive dysfunction, and personality changes. Early diagnosis is challenging due to subtle symptoms and varied presentations, often leading to misdiagnosis with traditional unimodal diagnostic methods due to their limited scope. This study introduces an advanced multimodal classification model that integrates… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 5 pages, 2 figures

  27. arXiv:2408.16247  [pdf, other

    cs.CV

    Anno-incomplete Multi-dataset Detection

    Authors: Yiran Xu, Haoxiang Zhong, Kai Wu, Jialin Li, Yong Liu, Chengjie Wang, Shu-Tao Xia, Hongen Liao

    Abstract: Object detectors have shown outstanding performance on various public datasets. However, annotating a new dataset for a new task is usually unavoidable in real, since 1) a single existing dataset usually does not contain all object categories needed; 2) using multiple datasets usually suffers from annotation incompletion and heterogeneous features. We propose a novel problem as "Annotation-incompl… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 12 pages, 9 figures

  28. arXiv:2408.16119  [pdf, other

    cs.HC cs.AI

    Data Formulator 2: Iteratively Creating Rich Visualizations with AI

    Authors: Chenglong Wang, Bongshin Lee, Steven Drucker, Dan Marshall, Jianfeng Gao

    Abstract: To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals. To achieve this, analysts need not only proficiency in data transformation and visualization tools but also efforts to manage the branching history consisting of many different versions of data and charts. Recent LLM-powered AI systems have greatly… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  29. arXiv:2408.15916  [pdf, other

    eess.AS cs.LG cs.SD

    Multi-modal Adversarial Training for Zero-Shot Voice Cloning

    Authors: John Janiczek, Dading Chong, Dongyang Dai, Arlo Faria, Chao Wang, Tao Wang, Yuzong Liu

    Abstract: A text-to-speech (TTS) model trained to reconstruct speech given text tends towards predictions that are close to the average characteristics of a dataset, failing to model the variations that make human speech sound natural. This problem is magnified for zero-shot voice cloning, a task that requires training data with high variance in speaking styles. We build off of recent works which have used… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted at INTERSPEECH 2024

  30. arXiv:2408.15247  [pdf, other

    cs.SE cs.AI cs.CL cs.HC cs.LG

    AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems

    Authors: Victor Dibia, Jingya Chen, Gagan Bansal, Suff Syed, Adam Fourney, Erkang Zhu, Chi Wang, Saleema Amershi

    Abstract: Multi-agent systems, where multiple agents (generative AI models + tools) collaborate, are emerging as an effective pattern for solving long-running, complex tasks in numerous domains. However, specifying their parameters (such as models, tools, and orchestration mechanisms etc,.) and debugging them remains challenging for most developers. To address this challenge, we present AUTOGEN STUDIO, a no… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 8 pages

  31. arXiv:2408.15203  [pdf, other

    cs.DC cs.IT

    On the Encoding Process in Decentralized Systems

    Authors: Canran Wang, Netanel Raviv

    Abstract: We consider the problem of encoding information in a system of N=K+R processors that operate in a decentralized manner, i.e., without a central processor which orchestrates the operation. The system involves K source processors, each holding some data modeled as a vector over a finite field. The remaining R processors are sinks, and each of which requires a linear combination of all data vectors.… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2205.05183

  32. arXiv:2408.15045  [pdf, other

    cs.CV

    DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding

    Authors: Wenhui Liao, Jiapeng Wang, Hongliang Li, Chengyu Wang, Jun Huang, Lianwen Jin

    Abstract: Text-rich document understanding (TDU) refers to analyzing and comprehending documents containing substantial textual content. With the rapid evolution of large language models (LLMs), they have been widely leveraged for TDU due to their remarkable versatility and generalization. In this paper, we introduce DocLayLLM, an efficient and effective multi-modal extension of LLMs specifically designed f… ▽ More

    Submitted 28 August, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

  33. arXiv:2408.15038  [pdf, other

    cs.CV

    Interactive Occlusion Boundary Estimation through Exploitation of Synthetic Data

    Authors: Lintao Xu, Chaohui Wang

    Abstract: Occlusion boundaries (OBs) geometrically localize the occlusion events in a 2D image, and contain useful information for addressing various scene understanding problems. To advance their study, we have led the investigation in the following three aspects. Firstly, we have studied interactive estimation of OBs, which is the first in the literature, and proposed an efficient deep-network-based metho… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  34. arXiv:2408.14909  [pdf, other

    cs.CL cs.LG cs.NE

    SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

    Authors: Shuaijie Shen, Chao Wang, Renzhuo Huang, Yan Zhong, Qinghai Guo, Zhichao Lu, Jianguo Zhang, Luziwei Leng

    Abstract: Known as low energy consumption networks, spiking neural networks (SNNs) have gained a lot of attention within the past decades. While SNNs are increasing competitive with artificial neural networks (ANNs) for vision tasks, they are rarely used for long sequence tasks, despite their intrinsic temporal dynamics. In this work, we develop spiking state space models (SpikingSSMs) for long sequence lea… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  35. arXiv:2408.14674  [pdf, other

    cs.CV

    gWaveNet: Classification of Gravity Waves from Noisy Satellite Data using Custom Kernel Integrated Deep Learning Method

    Authors: Seraj Al Mahmud Mostafa, Omar Faruque, Chenxi Wang, Jia Yue, Sanjay Purushotham, Jianwu Wang

    Abstract: Atmospheric gravity waves occur in the Earths atmosphere caused by an interplay between gravity and buoyancy forces. These waves have profound impacts on various aspects of the atmosphere, including the patterns of precipitation, cloud formation, ozone distribution, aerosols, and pollutant dispersion. Therefore, understanding gravity waves is essential to comprehend and monitor changes in a wide r… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted at the 27th International Conference on Pattern Recognition (ICPR) 2024

  36. arXiv:2408.14016  [pdf, other

    cs.CV cs.AI

    Pixel-Aligned Multi-View Generation with Depth Guided Decoder

    Authors: Zhenggang Tang, Peiye Zhuang, Chaoyang Wang, Aliaksandr Siarohin, Yash Kant, Alexander Schwing, Sergey Tulyakov, Hsin-Ying Lee

    Abstract: The task of image-to-multi-view generation refers to generating novel views of an instance from a single image. Recent methods achieve this by extending text-to-image latent diffusion models to multi-view version, which contains an VAE image encoder and a U-Net diffusion model. Specifically, these generation methods usually fix VAE and finetune the U-Net only. However, the significant downscaling… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  37. arXiv:2408.14000  [pdf, other

    cs.RO

    Quantitative Representation of Scenario Difficulty for Autonomous Driving Based on Adversarial Policy Search

    Authors: Shuo Yang, Caojun Wang, Yuanjian Zhang, Yuming Yin, Yanjun Huang, Shengbo Eben Li, Hong Chen

    Abstract: Adversarial scenario generation is crucial for autonomous driving testing because it can efficiently simulate various challenge and complex traffic conditions. However, it is difficult to control current existing methods to generate desired scenarios, such as the ones with different conflict levels. Therefore, this paper proposes a data-driven quantitative method to represent scenario difficulty.… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  38. arXiv:2408.13591  [pdf, other

    stat.ML cs.LG

    Optimal Kernel Quantile Learning with Random Features

    Authors: Caixing Wang, Xingdong Feng

    Abstract: The random feature (RF) approach is a well-established and efficient tool for scalable kernel methods, but existing literature has primarily focused on kernel ridge regression with random features (KRR-RF), which has limitations in handling heterogeneous data with heavy-tailed noises. This paper presents a generalization study of kernel quantile regression with random features (KQR-RF), which acco… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 34 pages, 8 figures, 3 tables

  39. arXiv:2408.13509  [pdf, other

    cs.CV

    DualAnoDiff: Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation

    Authors: Ying Jin, Jinlong Peng, Qingdong He, Teng Hu, Hao Chen, Jiafu Wu, Wenbing Zhu, Mingmin Chi, Jun Liu, Yabiao Wang, Chengjie Wang

    Abstract: The performance of anomaly inspection in industrial manufacturing is constrained by the scarcity of anomaly data. To overcome this challenge, researchers have started employing anomaly generation approaches to augment the anomaly dataset. However, existing anomaly generation methods suffer from limited diversity in the generated anomalies and struggle to achieve a seamless blending of this anomaly… ▽ More

    Submitted 28 August, 2024; v1 submitted 24 August, 2024; originally announced August 2024.

    Comments: Code: https://github.com/yinyjin/DualAnoDiff

  40. arXiv:2408.13461  [pdf, other

    cs.CV cs.AI

    Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach

    Authors: Jiwei Guan, Tianyu Ding, Longbing Cao, Lei Pan, Chen Wang, Xi Zheng

    Abstract: Vision-language pretraining (VLP) with transformers has demonstrated exceptional performance across numerous multimodal tasks. However, the adversarial robustness of these models has not been thoroughly investigated. Existing multimodal attack methods have largely overlooked cross-modal interactions between visual and textual modalities, particularly in the context of cross-attention mechanisms. I… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  41. arXiv:2408.13005  [pdf, other

    cs.CV

    EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation

    Authors: Cong Wang, Jiaxi Gu, Panwen Hu, Haoyu Zhao, Yuanfan Guo, Jianhua Han, Hang Xu, Xiaodan Liang

    Abstract: Following the advancements in text-guided image generation technology exemplified by Stable Diffusion, video generation is gaining increased attention in the academic community. However, relying solely on text guidance for video generation has serious limitations, as videos contain much richer content than images, especially in terms of motion. This information can hardly be adequately described w… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  42. arXiv:2408.12800  [pdf, other

    cs.MM

    Cap2Sum: Learning to Summarize Videos by Generating Captions

    Authors: Cairong Zhao, Chutian Wang, Zifan Song, Guosheng Hu, Haonan Chen, Xiaofan Zhai

    Abstract: With the rapid growth of video data on the internet, video summarization is becoming a very important AI technology. However, due to the high labelling cost of video summarization, existing studies have to be conducted on small-scale datasets, leading to limited performance and generalization capacity. In this work, we introduce the use of dense video captions as a supervision signal to train vide… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 13 pages, 4 figures

  43. arXiv:2408.12353  [pdf, other

    stat.ML cs.LG math.ST

    Distributed quasi-Newton robust estimation under differential privacy

    Authors: Chuhan Wang, Lixing Zhu, Xuehu Zhu

    Abstract: For distributed computing with Byzantine machines under Privacy Protection (PP) constraints, this paper develops a robust PP distributed quasi-Newton estimation, which only requires the node machines to transmit five vectors to the central processor with high asymptotic relative efficiency. Compared with the gradient descent strategy which requires more rounds of transmission and the Newton iterat… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 38 pages, 6 figures

  44. arXiv:2408.12340  [pdf, other

    cs.CV

    VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding

    Authors: Yujie Liang, Xiaobin Hu, Boyuan Jiang, Donghao Luo, Kai WU, Wenhui Han, Taisong Jin, Chengjie Wang

    Abstract: Although diffusion-based image virtual try-on has made considerable progress, emerging approaches still struggle to effectively address the issue of hand occlusion (i.e., clothing regions occluded by the hand part), leading to a notable degradation of the try-on performance. To tackle this issue widely existing in real-world scenarios, we propose VTON-HandFit, leveraging the power of hand priors t… ▽ More

    Submitted 26 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: The project page is \url{https://vton-handfit.github.io}

  45. arXiv:2408.12190  [pdf, other

    cs.RO

    A Safety-Oriented Self-Learning Algorithm for Autonomous Driving: Evolution Starting from a Basic Model

    Authors: Shuo Yang, Caojun Wang, Zhenyu Ma, Yanjun Huang, Hong Chen

    Abstract: Autonomous driving vehicles with self-learning capabilities are expected to evolve in complex environments to improve their ability to cope with different scenarios. However, most self-learning algorithms suffer from low learning efficiency and lacking safety, which limits their applications. This paper proposes a safety-oriented self-learning algorithm for autonomous driving, which focuses on how… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  46. arXiv:2408.12109  [pdf, other

    cs.CV cs.CL

    RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

    Authors: Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Murun Yang, Qiaozhi He, Tong Xiao, Chunliang Zhang, Tongran Liu, Quan Du, Di Yang, Jingbo Zhu

    Abstract: Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difficulty arising from the sc… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  47. arXiv:2408.11475  [pdf, other

    cs.CV

    TrackGo: A Flexible and Efficient Method for Controllable Video Generation

    Authors: Haitao Zhou, Chuang Wang, Rui Nie, Jinxiao Lin, Dongdong Yu, Qian Yu, Changhu Wang

    Abstract: Recent years have seen substantial progress in diffusion-based controllable video generation. However, achieving precise control in complex scenarios, including fine-grained object parts, sophisticated motion trajectories, and coherent background movement, remains a challenge. In this paper, we introduce TrackGo, a novel approach that leverages free-form masks and arrows for conditional video gene… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  48. arXiv:2408.11393  [pdf, other

    cs.CL cs.LG

    First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

    Authors: Chi Ma, Mincong Huang, Ying Zhang, Chao Wang, Yujie Wang, Lei Yu, Chuan Liu, Wei Lin

    Abstract: Dynamic activation (DA) techniques, such as DejaVu and MoEfication, have demonstrated their potential to significantly enhance the inference efficiency of large language models (LLMs). However, these techniques often rely on ReLU activation functions or require additional parameters and training to maintain performance. This paper introduces a training-free Threshold-based Dynamic Activation(TDA)… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  49. arXiv:2408.10487  [pdf, other

    cs.CV cs.AI

    MambaEVT: Event Stream based Visual Object Tracking using State Space Model

    Authors: Xiao Wang, Chao wang, Shiao Wang, Xixi Wang, Zhicheng Zhao, Lin Zhu, Bo Jiang

    Abstract: Event camera-based visual tracking has drawn more and more attention in recent years due to the unique imaging principle and advantages of low energy consumption, high dynamic range, and dense temporal resolution. Current event-based tracking algorithms are gradually hitting their performance bottlenecks, due to the utilization of vision Transformer and the static template for target object locali… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: In Peer Review

  50. arXiv:2408.10202  [pdf, other

    cs.CV

    SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

    Authors: Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang, Yuta Nakashima, Yu-Chiang Frank Wang, Ryo Hachiuma

    Abstract: Large-scale vision-language models, such as CLIP, are known to contain harmful societal bias regarding protected attributes (e.g., gender and age). In this paper, we aim to address the problems of societal bias in CLIP. Although previous studies have proposed to debias societal bias through adversarial learning or test-time projecting, our comprehensive study of these works identifies two critical… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.