Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 4,007 results for author: Liu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20893  [pdf, other

    cs.LG cs.AI eess.SP

    MambaCapsule: Towards Transparent Cardiac Disease Diagnosis with Electrocardiography Using Mamba Capsule Network

    Authors: Yinlong Xu, Xiaoqiang Liu, Zitai Kong, Yixuan Wu, Yue Wang, Yingzhou Lu, Honghao Gao, Jian Wu, Hongxia Xu

    Abstract: Cardiac arrhythmia, a condition characterized by irregular heartbeats, often serves as an early indication of various heart ailments. With the advent of deep learning, numerous innovative models have been introduced for diagnosing arrhythmias using Electrocardiogram (ECG) signals. However, recent studies solely focus on the performance of models, neglecting the interpretation of their results. Thi… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  2. arXiv:2407.20271  [pdf, other

    cs.LG cs.AI cs.CL

    Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models

    Authors: Haoyu Tang, Ye Liu, Xukai Liu, Kai Zhang, Yanghai Zhang, Qi Liu, Enhong Chen

    Abstract: Recent advancements in machine learning, especially in Natural Language Processing (NLP), have led to the development of sophisticated models trained on vast datasets, but this progress has raised concerns about potential sensitive information leakage. In response, regulatory measures like the EU General Data Protection Regulation (GDPR) have driven the exploration of Machine Unlearning techniques… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  3. arXiv:2407.20143  [pdf, other

    cs.AI

    ByteCheckpoint: A Unified Checkpointing System for LLM Development

    Authors: Borui Wan, Mingji Han, Yiyao Sheng, Zhichao Lai, Mofan Zhang, Junda Zhang, Yanghua Peng, Haibin Lin, Xin Liu, Chuan Wu

    Abstract: The development of real-world Large Language Models (LLMs) necessitates checkpointing of training states in persistent storage to mitigate potential software and hardware failures, as well as to facilitate checkpoint transferring within the training pipeline and across various tasks. Due to the immense size of LLMs, saving and loading checkpoints often incur intolerable minute-level stalls, signif… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  4. arXiv:2407.20124  [pdf, other

    cs.MM cs.AI

    AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics

    Authors: Xiangxiang Dai, Zeyu Zhang, Peng Yang, Yuedong Xu, Xutong Liu, John C. S. Lui

    Abstract: The rapid evolution of multimedia and computer vision technologies requires adaptive visual model deployment strategies to effectively handle diverse tasks and varying environments. This work introduces AxiomVision, a novel framework that can guarantee accuracy by leveraging edge computing to dynamically select the most efficient visual models for video analytics under diverse scenarios. Utilizing… ▽ More

    Submitted 30 July, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  5. arXiv:2407.19988  [pdf, other

    cs.MM

    HeadsetOff: Enabling Photorealistic Video Conferencing on Economical VR Headsets

    Authors: Yili Jin, Xize Duan, Fangxin Wang, Xue Liu

    Abstract: Virtual Reality (VR) headsets have become increasingly popular for remote collaboration, but video conferencing poses challenges when the user's face is covered by the headset. Existing solutions have limitations in terms of accessibility. In this paper, we propose HeadsetOff, a novel system that achieves photorealistic video conferencing on economical VR headsets by leveraging voice-driven face r… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM Multimedia 2024

  6. arXiv:2407.19981  [pdf, other

    cs.CV

    Adversarial Robustness in RGB-Skeleton Action Recognition: Leveraging Attention Modality Reweighter

    Authors: Chao Liu, Xin Liu, Zitong Yu, Yonghong Hou, Huanjing Yue, Jingyu Yang

    Abstract: Deep neural networks (DNNs) have been applied in many computer vision tasks and achieved state-of-the-art (SOTA) performance. However, misclassification will occur when DNNs predict adversarial examples which are created by adding human-imperceptible adversarial noise to natural examples. This limits the application of DNN in security-critical fields. In order to enhance the robustness of models,… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCB 2024

  7. arXiv:2407.19875  [pdf, other

    cs.CV

    Exploring Robust Face-Voice Matching in Multilingual Environments

    Authors: Jiehui Tang, Xiaofei Wang, Zhen Xiao, Jiayi Liu, Xueliang Liu, Richang Hong

    Abstract: This paper presents Team Xaiofei's innovative approach to exploring Face-Voice Association in Multilingual Environments (FAME) at ACM Multimedia 2024. We focus on the impact of different languages in face-voice matching by building upon Fusion and Orthogonal Projection (FOP), introducing four key components: a dual-branch structure, dynamic sample pair weighting, robust data augmentation, and scor… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  8. arXiv:2407.19768  [pdf, other

    cs.CV

    Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network

    Authors: Wenjie Li, Heng Guo, Xuannan Liu, Kongming Liang, Jiani Hu, Zhanyu Ma, Jun Guo

    Abstract: Face super-resolution aims to reconstruct a high-resolution face image from a low-resolution face image. Previous methods typically employ an encoder-decoder structure to extract facial structural features, where the direct downsampling inevitably introduces distortions, especially to high-frequency features such as edges. To address this issue, we propose a wavelet-based feature enhancement netwo… ▽ More

    Submitted 30 July, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

  9. arXiv:2407.19721  [pdf, other

    cs.NI cs.AI cs.DC

    Rina: Enhancing Ring-AllReduce with In-network Aggregation in Distributed Model Training

    Authors: Zixuan Chen, Xuandong Liu, Minglin Li, Yinfan Hu, Hao Mei, Huifeng Xing, Hao Wang, Wanxin Shi, Sen Liu, Yang Xu

    Abstract: Parameter Server (PS) and Ring-AllReduce (RAR) are two widely utilized synchronization architectures in multi-worker Deep Learning (DL), also referred to as Distributed Deep Learning (DDL). However, PS encounters challenges with the ``incast'' issue, while RAR struggles with problems caused by the long dependency chain. The emerging In-network Aggregation (INA) has been proposed to integrate with… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: To appear in ICNP 2024. Preview version only

  10. arXiv:2407.19547  [pdf, other

    cs.CV

    Temporal Feature Matters: A Framework for Diffusion Model Quantization

    Authors: Yushi Huang, Ruihao Gong, Xianglong Liu, Jing Liu, Yuhang Li, Jiwen Lu, Dacheng Tao

    Abstract: The Diffusion models, widely used for image generation, face significant challenges related to their broad applicability due to prolonged inference times and high memory demands. Efficient Post-Training Quantization (PTQ) is crucial to address these issues in traditional models. Unlike those models, diffusion models critically rely on the time-step $t$ for effective multi-round denoising. Typicall… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2311.16503

  11. arXiv:2407.19524  [pdf, other

    cs.CV cs.AI

    VersusDebias: Universal Zero-Shot Debiasing for Text-to-Image Models via SLM-Based Prompt Engineering and Generative Adversary

    Authors: Hanjun Luo, Ziye Deng, Haoyu Huang, Xuecheng Liu, Ruizhe Chen, Zuozhu Liu

    Abstract: With the rapid development of Text-to-Image models, biases in human image generation against demographic groups social attract more and more concerns. Existing methods are designed based on certain models with fixed prompts, unable to accommodate the trend of high-speed updating of Text-to-Image (T2I) models and variable prompts in practical scenes. Additionally, they fail to consider the possibil… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  12. arXiv:2407.19166  [pdf, other

    cs.CV

    Revisit Self-supervised Depth Estimation with Local Structure-from-Motion

    Authors: Shengjie Zhu, Xiaoming Liu

    Abstract: Both self-supervised depth estimation and Structure-from-Motion (SfM) recover scene depth from RGB videos. Despite sharing a similar objective, the two approaches are disconnected. Prior works of self-supervision backpropagate losses defined within immediate neighboring frames. Instead of learning-through-loss, this work proposes an alternative scheme by performing local SfM. First, with calibrate… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  13. arXiv:2407.19154  [pdf, other

    cs.CV

    RePLAy: Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry

    Authors: Shengjie Zhu, Girish Chandar Ganesan, Abhinav Kumar, Xiaoming Liu

    Abstract: 3D sensing is a fundamental task for Autonomous Vehicles. Its deployment often relies on aligned RGB cameras and LiDAR. Despite meticulous synchronization and calibration, systematic misalignment persists in LiDAR projected depthmap. This is due to the physical baseline distance between the two sensors. The artifact is often reflected as background LiDAR incorrectly projected onto the foreground,… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  14. arXiv:2407.18957  [pdf, other

    q-fin.TR cs.AI cs.MA

    When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments

    Authors: Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang, Lingyao Li, Zhenting Wang, Wenyue Hua, Dong Shu, Suiyuan Zhu, Xiaobo Jin, Sujian Li, Mengnan Du, Yongfeng Zhang

    Abstract: Can AI Agents simulate real-world trading environments to investigate the impact of external factors on stock trading activities (e.g., macroeconomics, policy changes, company fundamentals, and global events)? These factors, which frequently influence trading behaviors, are critical elements in the quest for maximizing investors' profits. Our work attempts to solve this problem through large langu… ▽ More

    Submitted 30 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 33 pages, 10 figures

  15. arXiv:2407.18955  [pdf, other

    cs.CV

    Real Face Video Animation Platform

    Authors: Xiaokai Chen, Xuan Liu, Donglin Di, Yongjia Ma, Wei Chen, Tonghua Su

    Abstract: In recent years, facial video generation models have gained popularity. However, these models often lack expressive power when dealing with exaggerated anime-style faces due to the absence of high-quality anime-style face training sets. We propose a facial animation platform that enables real-time conversion from real human faces to cartoon-style faces, supporting multiple models. Built on the Gra… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  16. arXiv:2407.18937  [pdf

    cs.IR cs.LG

    Advancements in Recommender Systems: A Comprehensive Analysis Based on Data, Algorithms, and Evaluation

    Authors: Xin Ma, Mingyue Li, Xuguang Liu

    Abstract: Using 286 research papers collected from Web of Science, ScienceDirect, SpringerLink, arXiv, and Google Scholar databases, a systematic review methodology was adopted to review and summarize the current challenges and potential future developments in data, algorithms, and evaluation aspects of RSs. It was found that RSs involve five major research topics, namely algorithmic improvement, domain app… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 24 pages, 10 figures, 3 tables

  17. FedUD: Exploiting Unaligned Data for Cross-Platform Federated Click-Through Rate Prediction

    Authors: Wentao Ouyang, Rui Dong, Ri Tao, Xiangzheng Liu

    Abstract: Click-through rate (CTR) prediction plays an important role in online advertising platforms. Most existing methods use data from the advertising platform itself for CTR prediction. As user behaviors also exist on many other platforms, e.g., media platforms, it is beneficial to further exploit such complementary information for better modeling user interest and for improving CTR prediction performa… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  18. arXiv:2407.18451  [pdf, other

    cs.RO

    Gaussian Lane Keeping: A Robust Prediction Baseline

    Authors: David Isele, Piyush Gupta, Xinyi Liu, Sangjae Bae

    Abstract: Predicting agents' behavior for vehicles and pedestrians is challenging due to a myriad of factors including the uncertainty attached to different intentions, inter-agent interactions, traffic (environment) rules, individual inclinations, and agent dynamics. Consequently, a plethora of neural network-driven prediction models have been introduced in the literature to encompass these intricacies to… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  19. arXiv:2407.18070  [pdf, other

    eess.IV cs.CV

    CSWin-UNet: Transformer UNet with Cross-Shaped Windows for Medical Image Segmentation

    Authors: Xiao Liu, Peng Gao, Tao Yu, Fei Wang, Ru-Yue Yuan

    Abstract: Deep learning, especially convolutional neural networks (CNNs) and Transformer architectures, have become the focus of extensive research in medical image segmentation, achieving impressive results. However, CNNs come with inductive biases that limit their effectiveness in more complex, varied segmentation scenarios. Conversely, while Transformer-based methods excel at capturing global and long-ra… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  20. arXiv:2407.17942  [pdf, other

    cs.RO cs.IT

    A Novel Perception Entropy Metric for Optimizing Vehicle Perception with LiDAR Deployment

    Authors: Yongjiang He, Peng Cao, Zhongling Su, Xiaobo Liu

    Abstract: Developing an effective evaluation metric is crucial for accurately and swiftly measuring LiDAR perception performance. One major issue is the lack of metrics that can simultaneously generate fast and accurate evaluations based on either object detection or point cloud data. In this study, we propose a novel LiDAR perception entropy metric based on the probability of vehicle grid occupancy. This m… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  21. DualFed: Enjoying both Generalization and Personalization in Federated Learning via Hierachical Representations

    Authors: Guogang Zhu, Xuefeng Liu, Jianwei Niu, Shaojie Tang, Xinghao Wu, Jiayuan Zhang

    Abstract: In personalized federated learning (PFL), it is widely recognized that achieving both high model generalization and effective personalization poses a significant challenge due to their conflicting nature. As a result, existing PFL methods can only manage a trade-off between these two objectives. This raises an interesting question: Is it feasible to develop a model capable of achieving both object… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MutltiMedia 2024

  22. arXiv:2407.17398  [pdf, other

    cs.CV

    3D Question Answering for City Scene Understanding

    Authors: Penglei Sun, Yaoxian Song, Xiang Liu, Xiaofei Yang, Qiang Wang, Tiefeng Li, Yang Yang, Xiaowen Chu

    Abstract: 3D multimodal question answering (MQA) plays a crucial role in scene understanding by enabling intelligent agents to comprehend their surroundings in 3D environments. While existing research has primarily focused on indoor household tasks and outdoor roadside autonomous driving tasks, there has been limited exploration of city-level scene understanding tasks. Furthermore, existing research faces c… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  23. arXiv:2407.16955  [pdf, other

    cs.CV cs.RO

    DVPE: Divided View Position Embedding for Multi-View 3D Object Detection

    Authors: Jiasen Wang, Zhenglin Li, Ke Sun, Xianyuan Liu, Yang Zhou

    Abstract: Sparse query-based paradigms have achieved significant success in multi-view 3D detection for autonomous vehicles. Current research faces challenges in balancing between enlarging receptive fields and reducing interference when aggregating multi-view features. Moreover, different poses of cameras present challenges in training global attention models. To address these problems, this paper proposes… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  24. arXiv:2407.16341  [pdf, other

    cs.CV

    Motion Capture from Inertial and Vision Sensors

    Authors: Xiaodong Chen, Wu Liu, Qian Bao, Xinchen Liu, Quanwei Yang, Ruoli Dai, Tao Mei

    Abstract: Human motion capture is the foundation for many computer vision and graphics tasks. While industrial motion capture systems with complex camera arrays or expensive wearable sensors have been widely adopted in movie and game production, consumer-affordable and easy-to-use solutions for personal applications are still far from mature. To utilize a mixture of a monocular camera and very few inertial… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 17 pages,9 figures

  25. arXiv:2407.16307  [pdf, other

    cs.MM cs.CR

    Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning

    Authors: Xinwei Liu, Xiaojun Jia, Yuan Xun, Siyuan Liang, Xiaochun Cao

    Abstract: Multimodal contrastive learning (MCL) has shown remarkable advances in zero-shot classification by learning from millions of image-caption pairs crawled from the Internet. However, this reliance poses privacy risks, as hackers may unauthorizedly exploit image-text data for model training, potentially including personal and privacy-sensitive information. Recent works propose generating unlearnable… ▽ More

    Submitted 26 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: ACM MM2024

  26. arXiv:2407.16139  [pdf, other

    cs.LG

    Tackling Feature-Classifier Mismatch in Federated Learning via Prompt-Driven Feature Transformation

    Authors: Xinghao Wu, Jianwei Niu, Xuefeng Liu, Mingjia Shi, Guogang Zhu, Shaojie Tang

    Abstract: In traditional Federated Learning approaches like FedAvg, the global model underperforms when faced with data heterogeneity. Personalized Federated Learning (PFL) enables clients to train personalized models to fit their local data distribution better. However, we surprisingly find that the feature extractor in FedAvg is superior to those in most PFL methods. More interestingly, by applying a line… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 23 pages, 9 figures

  27. arXiv:2407.16133  [pdf, other

    cs.CV

    Open-Set Biometrics: Beyond Good Closed-Set Models

    Authors: Yiyang Su, Minchul Kim, Feng Liu, Anil Jain, Xiaoming Liu

    Abstract: Biometric recognition has primarily addressed closed-set identification, assuming all probe subjects are in the gallery. However, most practical applications involve open-set biometrics, where probe subjects may or may not be present in the gallery. This poses distinct challenges in effectively distinguishing individuals in the gallery while minimizing false detections. While it is commonly believ… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Published at ECCV 2024

  28. arXiv:2407.16115  [pdf, other

    cs.LG cs.AI

    Transformer-based Graph Neural Networks for Battery Range Prediction in AIoT Battery-Swap Services

    Authors: Zhao Li, Yang Liu, Chuan Zhou, Xuanwu Liu, Xuming Pan, Buqing Cao, Xindong Wu

    Abstract: The concept of the sharing economy has gained broad recognition, and within this context, Sharing E-Bike Battery (SEB) have emerged as a focal point of societal interest. Despite the popularity, a notable discrepancy remains between user expectations regarding the remaining battery range of SEBs and the reality, leading to a pronounced inclination among users to find an available SEB during emerge… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 9pages, 6figures, accepted by IEEE ICWS 2024 The International Conference on Web Services

  29. arXiv:2407.15620  [pdf, other

    cs.IR cs.LG

    Dual Test-time Training for Out-of-distribution Recommender System

    Authors: Xihong Yang, Yiqi Wang, Jin Chen, Wenqi Fan, Xiangyu Zhao, En Zhu, Xinwang Liu, Defu Lian

    Abstract: Deep learning has been widely applied in recommender systems, which has achieved revolutionary progress recently. However, most existing learning-based methods assume that the user and item distributions remain unchanged between the training phase and the test phase. However, the distribution of user and item features can naturally shift in real-world scenarios, potentially resulting in a substant… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  30. arXiv:2407.15464  [pdf, other

    cs.LG cs.DC

    The Diversity Bonus: Learning from Dissimilar Distributed Clients in Personalized Federated Learning

    Authors: Xinghao Wu, Xuefeng Liu, Jianwei Niu, Guogang Zhu, Shaojie Tang, Xiaotian Li, Jiannong Cao

    Abstract: Personalized Federated Learning (PFL) is a commonly used framework that allows clients to collaboratively train their personalized models. PFL is particularly useful for handling situations where data from different clients are not independent and identically distributed (non-IID). Previous research in PFL implicitly assumes that clients can gain more benefits from those with similar data distribu… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 14 pages, 9 figures

  31. arXiv:2407.15399  [pdf, other

    cs.CL cs.AI cs.CR

    Imposter.AI: Adversarial Attacks with Hidden Intentions towards Aligned Large Language Models

    Authors: Xiao Liu, Liangzhi Li, Tong Xiang, Fuying Ye, Lu Wei, Wangyue Li, Noa Garcia

    Abstract: With the development of large language models (LLMs) like ChatGPT, both their vast applications and potential vulnerabilities have come to the forefront. While developers have integrated multiple safety mechanisms to mitigate their misuse, a risk remains, particularly when models encounter adversarial inputs. This study unveils an attack mechanism that capitalizes on human conversation strategies… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  32. arXiv:2407.15328  [pdf, other

    cs.CV

    Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models

    Authors: Xiao Liu, Xiaoliu Guan, Yu Wu, Jiaxu Miao

    Abstract: Diffusion models, known for their tremendous ability to generate novel and high-quality samples, have recently raised concerns due to their data memorization behavior, which poses privacy risks. Recent approaches for memory mitigation either only focused on the text modality problem in cross-modal generation tasks or utilized data augmentation strategies. In this paper, we propose a novel training… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: To appear in ECCV 2024, 20 pages with 7 figures

  33. arXiv:2407.15240  [pdf, other

    cs.CV

    BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM

    Authors: Hanjun Luo, Haoyu Huang, Ziye Deng, Xuecheng Liu, Ruizhe Chen, Zuozhu Liu

    Abstract: Text-to-Image (T2I) generative models are becoming more crucial in terms of their ability to generate complex and high-quality images, which also raises concerns about the social biases in their outputs, especially in human generation. Sociological research has established systematic classifications of bias; however, existing research of T2I models often conflates different types of bias, hinderin… ▽ More

    Submitted 23 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2405.17814

  34. arXiv:2407.15176  [pdf, other

    cs.CL cs.AI

    Farewell to Length Extrapolation, a Training-Free Infinite Context with Finite Attention Scope

    Authors: Xiaoran Liu, Qipeng Guo, Yuerong Song, Zhigeng Liu, Kai Lv, Hang Yan, Linlin Li, Qun Liu, Xipeng Qiu

    Abstract: The maximum supported context length is a critical bottleneck limiting the practical application of the Large Language Model (LLM). Although existing length extrapolation methods can extend the context of LLMs to millions of tokens, these methods all have an explicit upper bound. In this work, we propose LongCache, a training-free approach that enables LLM to support an infinite context with finit… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 8 pages, 7 figures

  35. arXiv:2407.15066  [pdf, other

    cs.CV

    LSReGen: Large-Scale Regional Generator via Backward Guidance Framework

    Authors: Bowen Zhang, Cheng Yang, Xuanhui Liu

    Abstract: In recent years, advancements in AIGC (Artificial Intelligence Generated Content) technology have significantly enhanced the capabilities of large text-to-image models. Despite these improvements, controllable image generation remains a challenge. Current methods, such as training, forward guidance, and backward guidance, have notable limitations. The first two approaches either demand substantial… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  36. arXiv:2407.14928  [pdf, other

    cs.SE cs.HC

    Influencer: Empowering Everyday Users in Creating Promotional Posts via AI-infused Exploration and Customization

    Authors: Xuye Liu, Annie Sun, Pengcheng An, Tengfei Ma, Jian Zhao

    Abstract: Creating promotional posts on social platforms enables everyday users to disseminate their creative outcomes, engage in community exchanges, or generate additional income from micro-businesses. However, creating eye-catching posts combining both original, appealing images and articulate, effective captions can be rather challenging and time-consuming for everyday users who are mostly design novice… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: 18 pages

  37. arXiv:2407.14505  [pdf, other

    cs.CV

    T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

    Authors: Kaiyue Sun, Kaiyi Huang, Xian Liu, Yue Wu, Zihan Xu, Zhenguo Li, Xihui Liu

    Abstract: Text-to-video (T2V) generation models have advanced significantly, yet their ability to compose different objects, attributes, actions, and motions into a video remains unexplored. Previous text-to-video benchmarks also neglect this important ability for evaluation. In this work, we conduct the first systematic study on compositional text-to-video generation. We propose T2V-CompBench, the first be… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 13 pages (30 in total), project page: https://t2v-compbench.github.io/

  38. arXiv:2407.14143  [pdf, other

    cs.CV cs.LG

    Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion

    Authors: Linlan Huang, Xusheng Cao, Haori Lu, Xialei Liu

    Abstract: Class-incremental learning is a challenging problem, where the goal is to train a model that can classify data from an increasing number of classes over time. With the advancement of vision-language pre-trained models such as CLIP, they demonstrate good generalization ability that allows them to excel in class-incremental learning with completely frozen parameters. However, further adaptation to d… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  39. arXiv:2407.14142  [pdf, other

    cs.CV cs.LG

    Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation

    Authors: Zhengyuan Xie, Haiquan Lu, Jia-wen Xiao, Enguang Wang, Le Zhang, Xialei Liu

    Abstract: Class incremental semantic segmentation aims to preserve old knowledge while learning new tasks, however, it is impeded by catastrophic forgetting and background shift issues. Prior works indicate the pivotal importance of initializing new classifiers and mainly focus on transferring knowledge from the background classifier or preparing classifiers for future classes, neglecting the flexibility an… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  40. arXiv:2407.14007  [pdf, other

    cs.CV cs.AI

    Multi-modal Relation Distillation for Unified 3D Representation Learning

    Authors: Huiqun Wang, Yiping Bao, Panwang Pan, Zeming Li, Xiao Liu, Ruijie Yang, Di Huang

    Abstract: Recent advancements in multi-modal pre-training for 3D point clouds have demonstrated promising results by aligning heterogeneous features across 3D shapes and their corresponding 2D images and language descriptions. However, current straightforward solutions often overlook intricate structural relations among samples, potentially limiting the full capabilities of multi-modal learning. To address… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  41. arXiv:2407.13782  [pdf, other

    eess.AS cs.AI cs.SD

    Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition

    Authors: Shujie Hu, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Yi Wang, Mingyu Cui, Tianzi Wang, Helen Meng, Xunying Liu

    Abstract: Self-supervised learning (SSL) based speech foundation models have been applied to a wide range of ASR tasks. However, their application to dysarthric and elderly speech via data-intensive parameter fine-tuning is confronted by in-domain data scarcity and mismatch. To this end, this paper explores a series of approaches to integrate domain fine-tuned SSL pre-trained models and their features into… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

  42. arXiv:2407.13757  [pdf, other

    cs.CL cs.AI cs.CR

    Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models

    Authors: Zhuo Chen, Jiawei Liu, Haotan Liu, Qikai Cheng, Fan Zhang, Wei Lu, Xiaozhong Liu

    Abstract: Retrieval-Augmented Generation (RAG) is applied to solve hallucination problems and real-time constraints of large language models, but it also induces vulnerabilities against retrieval corruption attacks. Existing research mainly explores the unreliability of RAG in white-box and closed-domain QA tasks. In this paper, we aim to reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) mod… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 10 pages, 3 figures, under review

  43. arXiv:2407.13719  [pdf, other

    cs.CV

    HazeCLIP: Towards Language Guided Real-World Image Dehazing

    Authors: Ruiyi Wang, Wenhao Li, Xiaohong Liu, Chunyi Li, Zicheng Zhang, Xiongkuo Min, Guangtao Zhai

    Abstract: Existing methods have achieved remarkable performance in single image dehazing, particularly on synthetic datasets. However, they often struggle with real-world hazy images due to domain shift, limiting their practical applicability. This paper introduces HazeCLIP, a language-guided adaptation framework designed to enhance the real-world performance of pre-trained dehazing networks. Inspired by th… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  44. arXiv:2407.13545  [pdf, other

    eess.IV cs.CV

    DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays

    Authors: Xuhui Liu, Zhi Qiao, Runkun Liu, Hong Li, Juan Zhang, Xiantong Zhen, Zhen Qian, Baochang Zhang

    Abstract: Computed tomography (CT) is widely utilized in clinical settings because it delivers detailed 3D images of the human body. However, performing CT scans is not always feasible due to radiation exposure and limitations in certain surgical environments. As an alternative, reconstructing CT images from ultra-sparse X-rays offers a valuable solution and has gained significant interest in scientific res… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  45. arXiv:2407.13193  [pdf, other

    cs.CL

    Retrieval-Augmented Generation for Natural Language Processing: A Survey

    Authors: Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue

    Abstract: Large language models (LLMs) have demonstrated great success in various fields, benefiting from their huge amount of parameters that store knowledge. However, LLMs still suffer from several key issues, such as hallucination problems, knowledge update issues, and lacking domain-specific expertise. The appearance of retrieval-augmented generation (RAG), which leverages an external knowledge database… ▽ More

    Submitted 18 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  46. arXiv:2407.12996  [pdf, other

    stat.ML cs.LG

    Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance

    Authors: Haiquan Lu, Xiaotian Liu, Yefan Zhou, Qunli Li, Kurt Keutzer, Michael W. Mahoney, Yujun Yan, Huanrui Yang, Yaoqing Yang

    Abstract: Recent studies on deep ensembles have identified the sharpness of the local minima of individual learners and the diversity of the ensemble members as key factors in improving test-time performance. Building on this, our study investigates the interplay between sharpness and diversity within deep ensembles, illustrating their crucial role in robust generalization to both in-distribution (ID) and o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  47. arXiv:2407.12718  [pdf, other

    cs.CV

    SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow

    Authors: Yuanzhi Zhu, Xingchao Liu, Qiang Liu

    Abstract: Diffusion models excel in high-quality generation but suffer from slow inference due to iterative sampling. While recent methods have successfully transformed diffusion models into one-step generators, they neglect model size reduction, limiting their applicability in compute-constrained scenarios. This paper aims to develop small, efficient one-step diffusion models based on the powerful rectifie… ▽ More

    Submitted 17 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  48. arXiv:2407.12611  [pdf, other

    cs.CV

    Deep Mutual Learning among Partially Labeled Datasets for Multi-Organ Segmentation

    Authors: Xiaoyu Liu, Linhao Qu, Ziyue Xie, Yonghong Shi, Zhijian Song

    Abstract: The task of labeling multiple organs for segmentation is a complex and time-consuming process, resulting in a scarcity of comprehensively labeled multi-organ datasets while the emergence of numerous partially labeled datasets. Current methods are inadequate in effectively utilizing the supervised information available from these datasets, thereby impeding the progress in improving the segmentation… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 10 pages, 9 figures

  49. arXiv:2407.12448  [pdf, other

    cs.LG

    Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning

    Authors: Xu-Hui Liu, Tian-Shuo Liu, Shengyi Jiang, Ruifeng Chen, Zhilong Zhang, Xinwei Chen, Yang Yu

    Abstract: Combining offline and online reinforcement learning (RL) techniques is indeed crucial for achieving efficient and safe learning where data acquisition is expensive. Existing methods replay offline data directly in the online phase, resulting in a significant challenge of data distribution shift and subsequently causing inefficiency in online fine-tuning. To address this issue, we introduce an inno… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  50. arXiv:2407.12431  [pdf, other

    cs.CV

    GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval

    Authors: Han Zhou, Wei Dong, Xiaohong Liu, Shuaicheng Liu, Xiongkuo Min, Guangtao Zhai, Jun Chen

    Abstract: Most existing Low-light Image Enhancement (LLIE) methods either directly map Low-Light (LL) to Normal-Light (NL) images or use semantic or illumination maps as guides. However, the ill-posed nature of LLIE and the difficulty of semantic retrieval from impaired inputs limit these methods, especially in extremely low-light conditions. To address this issue, we present a new LLIE network via Generati… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024