Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 79 results for author: Dai, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.01128  [pdf, other

    cs.LG cs.CV

    Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning

    Authors: Jinglin Liang, Jin Zhong, Hanlin Gu, Zhongqi Lu, Xingxing Tang, Gang Dai, Shuangping Huang, Lixin Fan, Qiang Yang

    Abstract: Federated Class Continual Learning (FCCL) merges the challenges of distributed client learning with the need for seamless adaptation to new classes without forgetting old ones. The key challenge in FCCL is catastrophic forgetting, an issue that has been explored to some extent in Continual Learning (CL). However, due to privacy preservation requirements, some conventional methods, such as experien… ▽ More

    Submitted 3 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV 2024 Oral

  2. arXiv:2409.00597  [pdf, other

    cs.MM cs.CL

    Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model

    Authors: Fuqiang Niu, Zebang Cheng, Xianghua Fu, Xiaojiang Peng, Genan Dai, Yin Chen, Hu Huang, Bowen Zhang

    Abstract: Stance detection, which aims to identify public opinion towards specific targets using social media data, is an important yet challenging task. With the proliferation of diverse multimodal social media content including text, and images multimodal stance detection (MSD) has become a crucial research area. However, existing MSD studies have focused on modeling stance within individual text-image pa… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: ACM MM2024

  3. arXiv:2408.09613  [pdf, other

    cs.SI cs.CY

    How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis

    Authors: Herun Wan, Minnan Luo, Zihan Ma, Guang Dai, Xiang Zhao

    Abstract: Information spreads faster through social media platforms than traditional media, thus becoming an ideal medium to spread misinformation. Meanwhile, automated accounts, known as social bots, contribute more to the misinformation dissemination. In this paper, we explore the interplay between social bots and misinformation on the Sina Weibo platform. We propose a comprehensive and large-scale misinf… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  4. arXiv:2408.07467  [pdf, other

    cs.CV

    Domain-invariant Representation Learning via Segment Anything Model for Blood Cell Classification

    Authors: Yongcheng Li, Lingcong Cai, Ying Lu, Cheng Lin, Yupeng Zhang, Jingyan Jiang, Genan Dai, Bowen Zhang, Jingzhou Cao, Xiangzhong Zhang, Xiaomao Fan

    Abstract: Accurate classification of blood cells is of vital significance in the diagnosis of hematological disorders. However, in real-world scenarios, domain shifts caused by the variability in laboratory procedures and settings, result in a rapid deterioration of the model's generalization performance. To address this issue, we propose a novel framework of domain-invariant representation learning (DoRL)… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  5. arXiv:2408.06716  [pdf, other

    cs.CV

    Towards Cross-Domain Single Blood Cell Image Classification via Large-Scale LoRA-based Segment Anything Model

    Authors: Yongcheng Li, Lingcong Cai, Ying Lu, Yupeng Zhang, Jingyan Jiang, Genan Dai, Bowen Zhang, Jingzhou Cao, Xiangzhong Zhang, Xiaomao Fan

    Abstract: Accurate classification of blood cells plays a vital role in hematological analysis as it aids physicians in diagnosing various medical conditions. In this study, we present a novel approach for classifying blood cell images known as BC-SAM. BC-SAM leverages the large-scale foundation model of Segment Anything Model (SAM) and incorporates a fine-tuning technique using LoRA, allowing it to extract… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  6. arXiv:2408.05503  [pdf, other

    cs.CV cs.AI

    Disentangled Noisy Correspondence Learning

    Authors: Zhuohang Dang, Minnan Luo, Jihong Wang, Chengyou Jia, Haochen Han, Herun Wan, Guang Dai, Xiaojun Chang, Jingdong Wang

    Abstract: Cross-modal retrieval is crucial in understanding latent correspondences across modalities. However, existing methods implicitly assume well-matched training data, which is impractical as real-world data inevitably involves imperfect alignments, i.e., noisy correspondences. Although some works explore similarity-based strategies to address such noise, they suffer from sub-optimal similarity predic… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  7. arXiv:2407.19778  [pdf

    cs.AI

    Multimodal Large Language Models for Bioimage Analysis

    Authors: Shanghang Zhang, Gaole Dai, Tiejun Huang, Jianxu Chen

    Abstract: Rapid advancements in imaging techniques and analytical methods over the past decade have revolutionized our ability to comprehensively probe the biological world at multiple scales, pinpointing the type, quantity, location, and even temporal dynamics of biomolecules. The surge in data complexity and volume presents significant challenges in translating this wealth of information into knowledge. T… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  8. arXiv:2407.15346  [pdf, other

    cs.CV cs.CL cs.MM

    Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models

    Authors: Wenbin An, Feng Tian, Jiahao Nie, Wenkai Shi, Haonan Lin, Yan Chen, QianYing Wang, Yaqiang Wu, Guang Dai, Ping Chen

    Abstract: Knowledge-based Visual Question Answering (KVQA) requires both image and world knowledge to answer questions. Current methods first retrieve knowledge from the image and external knowledge base with the original complex question, then generate answers with Large Language Models (LLMs). However, since the original question contains complex elements that require knowledge from different sources, acq… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Pre-print

  9. arXiv:2407.03917  [pdf, other

    cs.CV

    Timestep-Aware Correction for Quantized Diffusion Models

    Authors: Yuzhe Yao, Feng Tian, Jun Chen, Haonan Lin, Guang Dai, Yong Liu, Jingdong Wang

    Abstract: Diffusion models have marked a significant breakthrough in the synthesis of semantically coherent images. However, their extensive noise estimation networks and the iterative generation process limit their wider application, particularly on resource-constrained platforms like mobile devices. Existing post-training quantization (PTQ) methods have managed to compress diffusion models to low precisio… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  10. arXiv:2407.01886  [pdf, other

    cs.LG cs.AI

    Core Knowledge Learning Framework for Graph Adaptation and Scalability Learning

    Authors: Bowen Zhang, Zhichao Huang, Genan Dai, Guangning Xu, Xiaomao Fan, Hu Huang

    Abstract: Graph classification is a pivotal challenge in machine learning, especially within the realm of graph-based data, given its importance in numerous real-world applications such as social network analysis, recommendation systems, and bioinformatics. Despite its significance, graph classification faces several hurdles, including adapting to diverse prediction tasks, training across multiple target do… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  11. arXiv:2407.00945  [pdf, other

    cs.LG

    Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

    Authors: Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: The rapid advancement of large language models (LLMs) has led to architectures with billions to trillions of parameters, posing significant deployment challenges due to their substantial demands on memory, processing power, and energy consumption. Sparse Mixture-of-Experts (SMoE) architectures have emerged as a solution, activating only a subset of parameters per token, thereby achieving faster in… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  12. arXiv:2406.14909  [pdf, other

    cs.LG cs.AI cs.CL

    MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

    Authors: Tianyu Fu, Haofeng Huang, Xuefei Ning, Genghan Zhang, Boju Chen, Tianqi Wu, Hongyi Wang, Zixiao Huang, Shiyao Li, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Sparse attention can effectively mitigate the significant memory and throughput demands of Large Language Models (LLMs) in long contexts. Existing methods typically employ a uniform sparse attention mask, applying the same sparse pattern across different attention heads and input lengths. However, this uniform approach fails to capture the diverse attention patterns inherent in LLMs, ignoring thei… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 10 pages

    ACM Class: I.2.7

  13. arXiv:2406.14629  [pdf, other

    cs.CL cs.AI

    Can LLMs Learn by Teaching? A Preliminary Study

    Authors: Xuefei Ning, Zifu Wang, Shiyao Li, Zinan Lin, Peiran Yao, Tianyu Fu, Matthew B. Blaschko, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Teaching to improve student models (e.g., knowledge distillation) is an extensively studied methodology in LLMs. However, for humans, teaching not only improves students but also improves teachers. We ask: Can LLMs also learn by teaching (LbT)? If yes, we can potentially unlock the possibility of continuously advancing the models without solely relying on human-produced data or stronger models. In… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Under review

  14. arXiv:2406.14373  [pdf, other

    cs.AI cs.CL cs.CY cs.HC cs.MA

    Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory

    Authors: Gordon Dai, Weijia Zhang, Jinhan Li, Siqi Yang, Chidera Onochie lbe, Srihas Rao, Arthur Caetano, Misha Sra

    Abstract: The emergence of Large Language Models (LLMs) and advancements in Artificial Intelligence (AI) offer an opportunity for computational social science research at scale. Building upon prior explorations of LLM agent design, our work introduces a simulated agent society where complex social relationships dynamically form and evolve over time. Agents are imbued with psychological drives and placed in… ▽ More

    Submitted 1 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  15. arXiv:2406.12718  [pdf, other

    cs.CV cs.AI cs.CL

    AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

    Authors: Wenbin An, Feng Tian, Sicong Leng, Jiahao Nie, Haonan Lin, QianYing Wang, Guang Dai, Ping Chen, Shijian Lu

    Abstract: Despite their great success across various multimodal tasks, Large Vision-Language Models (LVLMs) are facing a prevalent problem with object hallucinations, where the generated textual responses are inconsistent with ground-truth objects in the given image. This paper investigates various LVLMs and pinpoints attention deficiency toward discriminative local image features as one root cause of objec… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  16. arXiv:2406.08552  [pdf, other

    cs.CV

    DiTFastAttn: Attention Compression for Diffusion Transformer Models

    Authors: Zhihang Yuan, Pu Lu, Hanling Zhang, Xuefei Ning, Linfeng Zhang, Tianchen Zhao, Shengen Yan, Guohao Dai, Yu Wang

    Abstract: Diffusion Transformers (DiT) excel at image and video generation but face computational challenges due to self-attention's quadratic complexity. We propose DiTFastAttn, a novel post-training compression method to alleviate DiT's computational bottleneck. We identify three key redundancies in the attention computation during DiT inference: 1. spatial redundancy, where many attention heads focus on… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  17. arXiv:2406.02540  [pdf, other

    cs.CV

    ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

    Authors: Tianchen Zhao, Tongcheng Fang, Enshu Liu, Rui Wan, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang

    Abstract: Diffusion transformers (DiTs) have exhibited remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions. However, larger model sizes and multi-frame processing for video generation lead to increased computational and memory costs, posing challenges for practical deployment on edge devices. Post-Training Quantization (PTQ) is an ef… ▽ More

    Submitted 30 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Project Page: https://a-suozhang.xyz/viditq.github.io/

  18. arXiv:2405.19012  [pdf, other

    cs.AI

    Implicit Neural Image Field for Biological Microscopy Image Compression

    Authors: Gaole Dai, Cheng-Ching Tseng, Qingpo Wuwu, Rongyu Zhang, Shaokang Wang, Ming Lu, Tiejun Huang, Yu Zhou, Ali Ata Tuz, Matthias Gunzer, Jianxu Chen, Shanghang Zhang

    Abstract: The rapid pace of innovation in biological microscopy imaging has led to large images, putting pressure on data storage and impeding efficient sharing, management, and visualization. This necessitates the development of efficient compression solutions. Traditional CODEC methods struggle to adapt to the diverse bioimaging data and often suffer from sub-optimal compression. In this study, we propose… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  19. arXiv:2405.17873  [pdf, other

    cs.CV cs.AI

    MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

    Authors: Tianchen Zhao, Xuefei Ning, Tongcheng Fang, Enshu Liu, Guyue Huang, Zinan Lin, Shengen Yan, Guohao Dai, Yu Wang

    Abstract: Diffusion models have achieved significant visual generation quality. However, their significant computational and memory costs pose challenge for their application on resource-constrained mobile devices or even desktop GPUs. Recent few-step diffusion models reduces the inference time by reducing the denoising steps. However, their memory consumptions are still excessive. The Post Training Quantiz… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Project Page: https://a-suozhang.xyz/mixdq.github.io/

  20. arXiv:2405.17761  [pdf, other

    cs.LG math.OC

    Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient

    Authors: Hao Di, Haishan Ye, Yueling Zhang, Xiangyu Chang, Guang Dai, Ivor W. Tsang

    Abstract: Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  21. arXiv:2405.16486  [pdf, other

    cs.CV cs.AI

    Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

    Authors: Rongyu Zhang, Aosong Cheng, Yulin Luo, Gaole Dai, Huanrui Yang, Jiaming Liu, Ran Xu, Li Du, Yuan Du, Yanbing Jiang, Shanghang Zhang

    Abstract: Continual Test-Time Adaptation (CTTA), which aims to adapt the pre-trained model to ever-evolving target domains, emerges as an important task for vision models. As current vision models appear to be heavily biased towards texture, continuously adapting the model from one domain distribution to another can result in serious catastrophic forgetting. Drawing inspiration from the human visual system'… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  22. arXiv:2405.16256  [pdf, other

    cs.DC cs.AI

    HETHUB: A Distributed Training System with Heterogeneous Cluster for Large-Scale Models

    Authors: Si Xu, Zixiao Huang, Yan Zeng, Shengen Yan, Xuefei Ning, Quanlu Zhang, Haolin Ye, Sipei Gu, Chunsheng Shui, Zhezheng Lin, Hao Zhang, Sheng Wang, Guohao Dai, Yu Wang

    Abstract: Training large-scale models relies on a vast number of computing resources. For example, training the GPT-4 model (1.8 trillion parameters) requires 25000 A100 GPUs . It is a challenge to build a large-scale cluster with one type of GPU-accelerator. Using multiple types of GPU-accelerators to construct a large-scale cluster is an effective way to solve the problem of insufficient homogeneous GPU-a… ▽ More

    Submitted 8 August, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  23. arXiv:2405.14224  [pdf, other

    cs.CV

    DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

    Authors: Yao Teng, Yue Wu, Han Shi, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu

    Abstract: Diffusion models have achieved great success in image generation, with the backbone evolving from U-Net to Vision Transformers. However, the computational cost of Transformers is quadratic to the number of tokens, leading to significant challenges when dealing with high-resolution images. In this work, we propose Diffusion Mamba (DiM), which combines the efficiency of Mamba, a sequence model based… ▽ More

    Submitted 10 July, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: The code of our work is available here: {\url{https://github.com/tyshiwo1/DiM-DiffusionMamba/}}

  24. arXiv:2405.02538  [pdf, other

    cs.CV

    AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition

    Authors: Meiqi Cao, Rui Yan, Xiangbo Shu, Guangzhao Dai, Yazhou Yao, Guo-Sen Xie

    Abstract: Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes, including individual activities, group activities, and global activities. Previous methods 1) heavily rely on manually annotated detection boxes in training and inference, hindering further practical deployment; or 2) directly employ normal detectors to detect multip… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  25. arXiv:2404.15305  [pdf, other

    eess.SP cs.LG

    ADAPT^2: Adapting Pre-Trained Sensing Models to End-Users via Self-Supervision Replay

    Authors: Hyungjun Yoon, Jaehyun Kwak, Biniyam Aschalew Tolera, Gaole Dai, Mo Li, Taesik Gong, Kimin Lee, Sung-Ju Lee

    Abstract: Self-supervised learning has emerged as a method for utilizing massive unlabeled data for pre-training models, providing an effective feature extractor for various mobile sensing applications. However, when deployed to end-users, these models encounter significant domain shifts attributed to user diversity. We investigate the performance degradation that occurs when self-supervised models are fine… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  26. arXiv:2404.14294  [pdf, other

    cs.CL cs.AI

    A Survey on Efficient Inference for Large Language Models

    Authors: Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang

    Abstract: Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within the field have been directed towards developing techniques aimed at enhancing the efficiency of LLM inference. This p… ▽ More

    Submitted 19 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  27. arXiv:2404.10267  [pdf, other

    cs.CV cs.AI

    OneActor: Consistent Character Generation via Cluster-Conditioned Guidance

    Authors: Jiahao Wang, Caixia Yan, Haonan Lin, Weizhan Zhang, Mengmeng Wang, Tieliang Gong, Guang Dai, Hao Sun

    Abstract: Text-to-image diffusion models benefit artists with high-quality image generation. Yet their stochastic nature hinders artists from creating consistent images of the same subject. Existing methods try to tackle this challenge and generate consistent content in various ways. However, they either depend on external restricted data or require expensive tuning of the diffusion model. For this issue, w… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  28. arXiv:2404.06710  [pdf, other

    cs.CV cs.AI

    SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera

    Authors: Gaole Dai, Zhenyu Wang, Qinwen Xu, Ming Lu, Wen Chen, Boxin Shi, Shanghang Zhang, Tiejun Huang

    Abstract: One of the most critical factors in achieving sharp Novel View Synthesis (NVS) using neural field methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) is the quality of the training images. However, Conventional RGB cameras are susceptible to motion blur. In contrast, neuromorphic cameras like event and spike cameras inherently capture more comprehensive temporal information… ▽ More

    Submitted 12 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  29. arXiv:2404.02241  [pdf, other

    cs.CV

    Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better

    Authors: Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Sergey Yekhanin, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Diffusion Models (DM) and Consistency Models (CM) are two types of popular generative models with good generation quality on various tasks. When training DM and CM, intermediate weight checkpoints are not fully utilized and only the last converged checkpoint is used. In this work, we find that high-quality model weights often lie in a basin which cannot be reached by SGD but can be obtained by pro… ▽ More

    Submitted 7 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  30. arXiv:2404.00878  [pdf, other

    cs.CV

    TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

    Authors: Jiazheng Xing, Chao Xu, Yijie Qian, Yang Liu, Guang Dai, Baigui Sun, Yong Liu, Jingdong Wang

    Abstract: Virtual try-on focuses on adjusting the given clothes to fit a specific person seamlessly while avoiding any distortion of the patterns and textures of the garment. However, the clothing identity uncontrollability and training inefficiency of existing diffusion-based methods, which struggle to maintain the identity even with full parameter training, are significant limitations that hinder the wide… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  31. arXiv:2403.19235  [pdf, other

    cs.CV

    DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation

    Authors: Haonan Lin, Mengmeng Wang, Yan Chen, Wenbin An, Yuzhe Yao, Guang Dai, Qianying Wang, Yong Liu, Jingdong Wang

    Abstract: While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centered images, novel challenges arise with a nuanced task of "identity fine editing": precisely modifying specific features of a subject while maintaining its inherent identity and context. Existing personalization methods either require time-consuming optimization or learning additional encoders, ad… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  32. arXiv:2403.16379  [pdf, other

    cs.CV

    FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models

    Authors: Lin Zhao, Tianchen Zhao, Zinan Lin, Xuefei Ning, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: In recent years, there has been significant progress in the development of text-to-image generative models. Evaluating the quality of the generative models is one essential step in the development process. Unfortunately, the evaluation process could consume a significant amount of computational resources, making the required periodic evaluation of model performance (e.g., monitoring training progr… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: The paper is accepted by CVPR 2024

  33. arXiv:2403.05105  [pdf, other

    cs.CV cs.AI cs.MM

    Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

    Authors: Haochen Han, Qinghua Zheng, Guang Dai, Minnan Luo, Jingdong Wang

    Abstract: Collecting well-matched multimedia datasets is crucial for training cross-modal retrieval models. However, in real-world scenarios, massive multimodal data are harvested from the Internet, which inevitably contains Partially Mismatched Pairs (PMPs). Undoubtedly, such semantical irrelevant data will remarkably harm the cross-modal retrieval performance. Previous efforts tend to mitigate this proble… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  34. arXiv:2402.18158  [pdf, other

    cs.CL cs.AI

    Evaluating Quantized Large Language Models

    Authors: Shiyao Li, Xuefei Ning, Luning Wang, Tengxuan Liu, Xiangsheng Shi, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Post-training quantization (PTQ) has emerged as a promising technique to reduce the cost of large language models (LLMs). Specifically, PTQ can effectively mitigate memory consumption and reduce computational overhead in LLMs. To meet the requirements of both high efficiency and performance across diverse scenarios, a comprehensive evaluation of quantized LLMs is essential to guide the selection o… ▽ More

    Submitted 6 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  35. arXiv:2402.15173  [pdf, other

    cs.LG

    Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer

    Authors: Yanjun Zhao, Sizhe Dang, Haishan Ye, Guang Dai, Yi Qian, Ivor W. Tsang

    Abstract: Fine-tuning large language models (LLMs) with classic first-order optimizers entails prohibitive GPU memory due to the backpropagation process. Recent works have turned to zeroth-order optimizers for fine-tuning, which save substantial memory by using two forward passes. However, these optimizers are plagued by the heterogeneity of parameter curvatures across different dimensions. In this work, we… ▽ More

    Submitted 31 August, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  36. arXiv:2402.05136  [pdf, other

    cs.CL

    LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K

    Authors: Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, Yu Wang

    Abstract: State-of-the-art large language models (LLMs) are now claiming remarkable supported context lengths of 256k or even more. In contrast, the average context lengths of mainstream benchmarks are insufficient (5k-21k), and they suffer from potential knowledge leakage and inaccurate metrics, resulting in biased evaluation. This paper introduces LV-Eval, a challenging long-context benchmark with five le… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  37. arXiv:2401.11649  [pdf, other

    cs.CV

    M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition

    Authors: Mengmeng Wang, Jiazheng Xing, Boyuan Jiang, Jun Chen, Jianbiao Mei, Xingxing Zuo, Guang Dai, Jingdong Wang, Yong Liu

    Abstract: Recently, the rise of large-scale vision-language pretrained models like CLIP, coupled with the technology of Parameter-Efficient FineTuning (PEFT), has captured substantial attraction in video action recognition. Nevertheless, prevailing approaches tend to prioritize strong supervised performance at the expense of compromising the models' generalization capabilities during transfer. In this paper… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Journal ref: AAAI2024

  38. arXiv:2401.10039  [pdf, other

    cs.CV

    GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition

    Authors: Guangzhao Dai, Xiangbo Shu, Wenhao Wu, Rui Yan, Jiachao Zhang

    Abstract: Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown impressive performance in various visual recognition tasks. This advancement paves the way for notable performance in Zero-Shot Egocentric Action Recognition (ZS-EAR). Typically, VLMs handle ZS-EAR as a global video-text matching task, which often leads to suboptimal alignment of vision and linguistic knowledge. We prop… ▽ More

    Submitted 11 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  39. arXiv:2401.03868  [pdf, other

    cs.AR cs.AI

    FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs

    Authors: Shulin Zeng, Jun Liu, Guohao Dai, Xinhao Yang, Tianyu Fu, Hongyi Wang, Wenheng Ma, Hanbo Sun, Shiyao Li, Zixiao Huang, Yadong Dai, Jintao Li, Zehao Wang, Ruoyu Zhang, Kairui Wen, Xuefei Ning, Yu Wang

    Abstract: Transformer-based Large Language Models (LLMs) have made a significant impact on various domains. However, LLMs' efficiency suffers from both heavy computation and memory overheads. Compression techniques like sparsification and quantization are commonly used to mitigate the gap between LLM's computation/memory overheads and hardware capacity. However, existing GPU and transformer-based accelerato… ▽ More

    Submitted 9 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted to FPGA'24

  40. arXiv:2312.16478  [pdf, other

    cs.LG

    Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation

    Authors: Zhuohang Dang, Minnan Luo, Chengyou Jia, Guang Dai, Xiaojun Chang, Jingdong Wang

    Abstract: Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. Recently, to alleviate expensive data collection, co-occurring pairs from the Internet are automatically harvested for training. However, it inevitably includes mismatched pairs, \ie, noisy correspondences, undermining supervision reliability and degrading performance. Current methods leverage deep ne… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  41. arXiv:2312.02226  [pdf, other

    cs.CV

    Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition

    Authors: Chengyou Jia, Minnan Luo, Xiaojun Chang, Zhuohang Dang, Mingfei Han, Mengmeng Wang, Guang Dai, Sizhe Dang, Jingdong Wang

    Abstract: Exploring open-vocabulary video action recognition is a promising venture, which aims to recognize previously unseen actions within any arbitrary set of categories. Existing methods typically adapt pretrained image-text models to the video domain, capitalizing on their inherent strengths in generalization. A common thread among such methods is the augmentation of visual embeddings with temporal in… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  42. arXiv:2311.16442  [pdf, other

    cs.LG cs.DC

    Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization

    Authors: Jinhao Li, Jiaming Xu, Shiyao Li, Shan Huang, Jun Liu, Yaoxiu Lian, Guohao Dai

    Abstract: Large language models (LLMs) have demonstrated impressive abilities in various domains while the inference cost is expensive. Many previous studies exploit quantization methods to reduce LLM inference cost by reducing latency and memory consumption. Applying 2-bit single-precision weight quantization brings >3% accuracy loss, so the state-of-the-art methods use mixed-precision methods for LLMs (e.… ▽ More

    Submitted 1 July, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  43. arXiv:2311.12862  [pdf, other

    cs.DC cs.CV cs.LG cs.PF

    TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

    Authors: Haotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han

    Abstract: Sparse convolution plays a pivotal role in emerging workloads, including point cloud processing in AR/VR, autonomous driving, and graph understanding in recommendation systems. Since the computation pattern is sparse and irregular, specialized high-performance kernels are required. Existing GPU libraries offer two dataflow types for sparse convolution. The gather-GEMM-scatter dataflow is easy to i… ▽ More

    Submitted 25 October, 2023; originally announced November 2023.

    Comments: MICRO 2023; Haotian Tang and Shang Yang contributed equally to this project

  44. arXiv:2311.01686  [pdf, other

    cs.CV cs.LG

    Disentangled Representation Learning with Transmitted Information Bottleneck

    Authors: Zhuohang Dang, Minnan Luo, Chengyou Jia, Guang Dai, Jihong Wang, Xiaojun Chang, Jingdong Wang

    Abstract: Encoding only the task-related information from the raw data, \ie, disentangled representation learning, can greatly contribute to the robustness and generalizability of models. Although significant advances have been made by regularizing the information in representations with information theory, two major challenges remain: 1) the representation compression inevitably leads to performance drop;… ▽ More

    Submitted 14 August, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

  45. arXiv:2311.01282  [pdf, other

    cs.LG cs.CL

    FlashDecoding++: Faster Large Language Model Inference on GPUs

    Authors: Ke Hong, Guohao Dai, Jiaming Xu, Qiuli Mao, Xiuhong Li, Jun Liu, Kangdi Chen, Yuhan Dong, Yu Wang

    Abstract: As the Large Language Model (LLM) becomes increasingly important in various domains. However, the following challenges still remain unsolved in accelerating LLM inference: (1) Synchronized partial softmax update. The softmax operation requires a synchronized update operation among each partial softmax result, leading to ~20% overheads for the attention computation in LLMs. (2) Under-utilized compu… ▽ More

    Submitted 5 January, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

  46. arXiv:2310.06218  [pdf, other

    cs.LG cs.AI

    SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration

    Authors: Jingyang Xiang, Siqi Li, Jun Chen, Shipeng Bai, Yukai Ma, Guang Dai, Yong Liu

    Abstract: The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1$\times$N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: 14 pages, 4 figures, Accepted by 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  47. arXiv:2309.11125  [pdf, other

    cs.CV

    PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement

    Authors: Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Jingdong Wang

    Abstract: Dominant Person Search methods aim to localize and recognize query persons in a unified network, which jointly optimizes two sub-tasks, \ie, pedestrian detection and Re-IDentification (ReID). Despite significant progress, current methods face two primary challenges: 1) the pedestrian candidates learned within detectors are suboptimal for the ReID task. 2) the potential for collaboration between tw… ▽ More

    Submitted 13 March, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

  48. arXiv:2308.10547  [pdf, other

    math.OC cs.LG eess.SY

    Decentralized Riemannian Conjugate Gradient Method on the Stiefel Manifold

    Authors: Jun Chen, Haishan Ye, Mengmeng Wang, Tianxin Huang, Guang Dai, Ivor W. Tsang, Yong Liu

    Abstract: The conjugate gradient method is a crucial first-order optimization method that generally converges faster than the steepest descent method, and its computational cost is much lower than that of second-order methods. However, while various types of conjugate gradient methods have been studied in Euclidean spaces and on Riemannian manifolds, there is little study for those in distributed scenarios.… ▽ More

    Submitted 12 March, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

    Journal ref: International Conference on Learning Representations, 2024

  49. arXiv:2308.10156  [pdf, other

    cs.CV cs.AI

    SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation

    Authors: Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Mengmeng Wang, Jingdong Wang

    Abstract: Despite significant progress in Text-to-Image (T2I) generative models, even lengthy and complex text descriptions still struggle to convey detailed controls. In contrast, Layout-to-Image (L2I) generation, aiming to generate realistic and complex scene images from user-specified layouts, has risen to prominence. However, existing methods transform layout information into tokens or RGB images for co… ▽ More

    Submitted 13 March, 2024; v1 submitted 20 August, 2023; originally announced August 2023.

    Comments: Accepted to AAAI 2024

    Journal ref: 38th AAAI Conference on Artificial Intelligence (AAAI2024), Vancouver, BC, Canada, 2024

  50. arXiv:2308.09346  [pdf, other

    cs.CV

    Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching

    Authors: Jiazheng Xing, Mengmeng Wang, Yudi Ruan, Bofan Chen, Yaowei Guo, Boyu Mu, Guang Dai, Jingdong Wang, Yong Liu

    Abstract: Class prototype construction and matching are core aspects of few-shot action recognition. Previous methods mainly focus on designing spatiotemporal relation modeling modules or complex temporal alignment algorithms. Despite the promising results, they ignored the value of class prototype construction and matching, leading to unsatisfactory performance in recognizing similar categories in every ta… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV2023