Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,664 results for author: Wang, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20647  [pdf, other

    cs.CV

    Image Re-Identification: Where Self-supervision Meets Vision-Language Learning

    Authors: Bin Wang, Yuying Liang, Lei Cai, Huakun Huang, Huanqiang Zeng

    Abstract: Recently, large-scale vision-language pre-trained models like CLIP have shown impressive performance in image re-identification (ReID). In this work, we explore whether self-supervision can aid in the use of CLIP for image ReID tasks. Specifically, we propose SVLL-ReID, the first attempt to integrate self-supervision and pre-trained CLIP via two training stages to facilitate the image ReID. We obs… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  2. arXiv:2407.19829  [pdf, other

    cs.IR cs.AI

    Generative Retrieval with Preference Optimization for E-commerce Search

    Authors: Mingming Li, Huimu Wang, Zuxu Chen, Guangtao Nie, Yiming Qiu, Binbin Wang, Guoyu Tang, Lin Liu, Jingwei Zhuo

    Abstract: Generative retrieval introduces a groundbreaking paradigm to document retrieval by directly generating the identifier of a pertinent document in response to a specific query. This paradigm has demonstrated considerable benefits and potential, particularly in representation and generalization capabilities, within the context of large language models. However, it faces significant challenges in E-co… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  3. arXiv:2407.19719  [pdf, other

    cs.CV

    Revolutionizing Urban Safety Perception Assessments: Integrating Multimodal Large Language Models with Street View Images

    Authors: Jiaxin Zhanga, Yunqin Lia, Tomohiro Fukudab, Bowen Wang

    Abstract: Measuring urban safety perception is an important and complex task that traditionally relies heavily on human resources. This process often involves extensive field surveys, manual data collection, and subjective assessments, which can be time-consuming, costly, and sometimes inconsistent. Street View Images (SVIs), along with deep learning methods, provide a way to realize large-scale urban safet… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures

  4. arXiv:2407.19703  [pdf, other

    cs.CR

    Efficient Byzantine-Robust and Provably Privacy-Preserving Federated Learning

    Authors: Chenfei Nie, Qiang Li, Yuxin Yang, Yuede Ji, Binghui Wang

    Abstract: Federated learning (FL) is an emerging distributed learning paradigm without sharing participating clients' private data. However, existing works show that FL is vulnerable to both Byzantine (security) attacks and data reconstruction (privacy) attacks. Almost all the existing FL defenses only address one of the two attacks. A few defenses address the two attacks, but they are not efficient and eff… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 13 pages

  5. arXiv:2407.19467  [pdf, other

    cs.IR cs.LG

    Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches and Insights

    Authors: Xiang-Rong Sheng, Feifan Yang, Litong Gong, Biao Wang, Zhangming Chan, Yujing Zhang, Yueyao Cheng, Yong-Nan Zhu, Tiezheng Ge, Han Zhu, Yuning Jiang, Jian Xu, Bo Zheng

    Abstract: Despite the recognized potential of multimodal data to improve model accuracy, many large-scale industrial recommendation systems, including Taobao display advertising system, predominantly depend on sparse ID features in their models. In this work, we explore approaches to leverage multimodal data to enhance the recommendation accuracy. We start from identifying the key challenges in adopting mul… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted at CIKM 2024

  6. arXiv:2407.19224  [pdf, other

    cs.SD cs.MM eess.AS

    RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues

    Authors: Tianrui Pan, Jie Liu, Bohan Wang, Jie Tang, Gangshan Wu

    Abstract: While existing Audio-Visual Speech Separation (AVSS) methods primarily concentrate on the audio-visual fusion strategy for two-speaker separation, they demonstrate a severe performance drop in the multi-speaker separation scenarios. Typically, AVSS methods employ guiding videos to sequentially isolate individual speakers from the given audio mixture, resulting in notable missing and noisy parts ac… ▽ More

    Submitted 29 July, 2024; v1 submitted 27 July, 2024; originally announced July 2024.

    Comments: Accepted by MM 2024

  7. arXiv:2407.19196  [pdf, other

    cs.CL cs.AI cs.SI

    Why Misinformation is Created? Detecting them by Integrating Intent Features

    Authors: Bing Wang, Ximing Li, Changchun Li, Bo Fu, Songwen Pei, Shengsheng Wang

    Abstract: Various social media platforms, e.g., Twitter and Reddit, allow people to disseminate a plethora of information more efficiently and conveniently. However, they are inevitably full of misinformation, causing damage to diverse aspects of our daily lives. To reduce the negative impact, timely identification of misinformation, namely Misinformation Detection (MD), has become an active research topic… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures. Accepted by CIKM 2024

  8. arXiv:2407.19192  [pdf, other

    cs.CL cs.CV cs.MM

    Harmfully Manipulated Images Matter in Multimodal Misinformation Detection

    Authors: Bing Wang, Shengsheng Wang, Changchun Li, Renchu Guan, Ximing Li

    Abstract: Nowadays, misinformation is widely spreading over various social media platforms and causes extremely negative impacts on society. To combat this issue, automatically identifying misinformation, especially those containing multimodal content, has attracted growing attention from the academic and industrial communities, and induced an active research topic named Multimodal Misinformation Detection… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024. Code: https://github.com/wangbing1416/HAMI-M3D

  9. arXiv:2407.18625  [pdf, other

    cs.ET cs.AI cs.NE

    Topology Optimization of Random Memristors for Input-Aware Dynamic SNN

    Authors: Bo Wang, Shaocong Wang, Ning Lin, Yi Li, Yifei Yu, Yue Zhang, Jichang Yang, Xiaoshan Wu, Yangu He, Songqi Wang, Rui Chen, Guoqi Li, Xiaojuan Qi, Zhongrui Wang, Dashan Shang

    Abstract: There is unprecedented development in machine learning, exemplified by recent large language models and world simulators, which are artificial neural networks running on digital computers. However, they still cannot parallel human brains in terms of energy efficiency and the streamlined adaptability to inputs of different difficulties, due to differences in signal representation, optimization, run… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 15 pages, 5 figures

  10. arXiv:2407.17356  [pdf, other

    cs.LG cs.NE

    Gradient-based inference of abstract task representations for generalization in neural networks

    Authors: Ali Hummos, Felipe del Río, Brabeeba Mien Wang, Julio Hurtado, Cristian B. Calderon, Guangyu Robert Yang

    Abstract: Humans and many animals show remarkably adaptive behavior and can respond differently to the same input depending on their internal goals. The brain not only represents the intermediate abstractions needed to perform a computation but also actively maintains a representation of the computation itself (task abstraction). Such separation of the computation and its abstraction is associated with fast… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  11. arXiv:2407.16940  [pdf, other

    cs.LG q-bio.GN

    GV-Rep: A Large-Scale Dataset for Genetic Variant Representation Learning

    Authors: Zehui Li, Vallijah Subasri, Guy-Bart Stan, Yiren Zhao, Bo Wang

    Abstract: Genetic variants (GVs) are defined as differences in the DNA sequences among individuals and play a crucial role in diagnosing and treating genetic diseases. The rapid decrease in next generation sequencing cost has led to an exponential increase in patient-level GV data. This growth poses a challenge for clinicians who must efficiently prioritize patient-specific GVs and integrate them with exist… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Preprint

  12. arXiv:2407.15590  [pdf, other

    cs.CV

    All rivers run into the sea: Unified Modality Brain-like Emotional Central Mechanism

    Authors: Xinji Mai, Junxiong Lin, Haoran Wang, Zeng Tao, Yan Wang, Shaoqi Yan, Xuan Tong, Jiawen Yu, Boyang Wang, Ziheng Zhou, Qing Zhao, Shuyong Gao, Wenqiang Zhang

    Abstract: In the field of affective computing, fully leveraging information from a variety of sensory modalities is essential for the comprehensive understanding and processing of human emotions. Inspired by the process through which the human brain handles emotions and the theory of cross-modal plasticity, we propose UMBEnet, a brain-like unified modal affective processing network. The primary design of UM… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  13. arXiv:2407.15451  [pdf, other

    cs.CV

    Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions

    Authors: Yihao Ai, Yifei Qi, Bo Wang, Yu Cheng, Xinchao Wang, Robby T. Tan

    Abstract: Existing 2D human pose estimation research predominantly concentrates on well-lit scenarios, with limited exploration of poor lighting conditions, which are a prevalent aspect of daily life. Recent studies on low-light pose estimation require the use of paired well-lit and low-light images with ground truths for training, which are impractical due to the inherent challenges associated with annotat… ▽ More

    Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 18 pages, 3 figure. Accepted by ECCV24

  14. arXiv:2407.15267  [pdf, other

    cs.CR

    A Learning-Based Attack Framework to Break SOTA Poisoning Defenses in Federated Learning

    Authors: Yuxin Yang, Qiang Li, Chenfei Nie, Yuan Hong, Meng Pang, Binghui Wang

    Abstract: Federated Learning (FL) is a novel client-server distributed learning framework that can protect data privacy. However, recent works show that FL is vulnerable to poisoning attacks. Many defenses with robust aggregators (AGRs) are proposed to mitigate the issue, but they are all broken by advanced attacks. Very recently, some renewed robust AGRs are designed, typically with novel clipping or/and f… ▽ More

    Submitted 24 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: This is an extended version of our CIKM 2024 paper

  15. arXiv:2407.14880  [pdf, other

    cs.CV

    A New Dataset and Framework for Real-World Blurred Images Super-Resolution

    Authors: Rui Qin, Ming Sun, Chao Zhou, Bin Wang

    Abstract: Recent Blind Image Super-Resolution (BSR) methods have shown proficiency in general images. However, we find that the efficacy of recent methods obviously diminishes when employed on image data with blur, while image data with intentional blur constitute a substantial proportion of general data. To further investigate and address this issue, we developed a new super-resolution dataset specifically… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  16. arXiv:2407.14710  [pdf, other

    cs.LG cs.CR

    Universally Harmonizing Differential Privacy Mechanisms for Federated Learning: Boosting Accuracy and Convergence

    Authors: Shuya Feng, Meisam Mohammady, Hanbin Hong, Shenao Yan, Ashish Kundu, Binghui Wang, Yuan Hong

    Abstract: Differentially private federated learning (DP-FL) is a promising technique for collaborative model training while ensuring provable privacy for clients. However, optimizing the tradeoff between privacy and accuracy remains a critical challenge. To our best knowledge, we propose the first DP-FL framework (namely UDP-FL), which universally harmonizes any randomization mechanism (e.g., an optimal one… ▽ More

    Submitted 23 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

  17. arXiv:2407.13925  [pdf, other

    physics.data-an cs.LG hep-ph stat.ML

    EggNet: An Evolving Graph-based Graph Attention Network for Particle Track Reconstruction

    Authors: Paolo Calafiura, Jay Chan, Loic Delabrouille, Brandon Wang

    Abstract: Track reconstruction is a crucial task in particle experiments and is traditionally very computationally expensive due to its combinatorial nature. Recently, graph neural networks (GNNs) have emerged as a promising approach that can improve scalability. Most of these GNN-based methods, including the edge classification (EC) and the object condensation (OC) approach, require an input graph that nee… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 7 pages, 5 figures

  18. arXiv:2407.13773  [pdf, other

    cs.DL cs.AI

    OpenDataLab: Empowering General Artificial Intelligence with Open Datasets

    Authors: Conghui He, Wei Li, Zhenjiang Jin, Chao Xu, Bin Wang, Dahua Lin

    Abstract: The advancement of artificial intelligence (AI) hinges on the quality and accessibility of data, yet the current fragmentation and variability of data sources hinder efficient data utilization. The dispersion of data sources and diversity of data formats often lead to inefficiencies in data retrieval and processing, significantly impeding the progress of AI research and applications. To address th… ▽ More

    Submitted 4 June, 2024; originally announced July 2024.

  19. arXiv:2407.13301  [pdf, other

    cs.CL cs.AI cs.LG

    CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis

    Authors: Junying Chen, Chi Gui, Anningzhe Gao, Ke Ji, Xidong Wang, Xiang Wan, Benyou Wang

    Abstract: The field of medical diagnosis has undergone a significant transformation with the advent of large language models (LLMs), yet the challenges of interpretability within these models remain largely unaddressed. This study introduces Chain-of-Diagnosis (CoD) to enhance the interpretability of LLM-based medical diagnostics. CoD transforms the diagnostic process into a diagnostic chain that mirrors a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  20. arXiv:2407.13246  [pdf, other

    cs.CV

    STS MICCAI 2023 Challenge: Grand challenge on 2D and 3D semi-supervised tooth segmentation

    Authors: Yaqi Wang, Yifan Zhang, Xiaodiao Chen, Shuai Wang, Dahong Qian, Fan Ye, Feng Xu, Hongyuan Zhang, Qianni Zhang, Chengyu Wu, Yunxiang Li, Weiwei Cui, Shan Luo, Chengkai Wang, Tianhao Li, Yi Liu, Xiang Feng, Huiyu Zhou, Dongyun Liu, Qixuan Wang, Zhouhao Lin, Wei Song, Yuanlin Li, Bing Wang, Chunshi Wang , et al. (2 additional authors not shown)

    Abstract: Computer-aided design (CAD) tools are increasingly popular in modern dental practice, particularly for treatment planning or comprehensive prognosis evaluation. In particular, the 2D panoramic X-ray image efficiently detects invisible caries, impacted teeth and supernumerary teeth in children, while the 3D dental cone beam computed tomography (CBCT) is widely used in orthodontics and endodontics d… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  21. arXiv:2407.13237  [pdf, other

    cs.AI

    LLM-Empowered State Representation for Reinforcement Learning

    Authors: Boyuan Wang, Yun Qu, Yuhang Jiang, Jianzhun Shao, Chang Liu, Wenming Yang, Xiangyang Ji

    Abstract: Conventional state representations in reinforcement learning often omit critical task-related details, presenting a significant challenge for value networks in establishing accurate mappings from states to task rewards. Traditional methods typically depend on extensive sample learning to enrich state representations with task-specific information, which leads to low sample efficiency and high time… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  22. arXiv:2407.13096  [pdf, other

    cs.PF

    DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information

    Authors: Qiang Wang, Laiyi Li, Weile Luo, Yijia Zhang, Bingqiang Wang

    Abstract: Increased reliance on graphics processing units (GPUs) for high-intensity computing tasks raises challenges regarding energy consumption. To address this issue, dynamic voltage and frequency scaling (DVFS) has emerged as a promising technique for conserving energy while maintaining the quality of service (QoS) of GPU applications. However, existing solutions using DVFS are hindered by inefficiency… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  23. arXiv:2407.12943  [pdf, other

    cs.CL cs.AI

    Halu-J: Critique-Based Hallucination Judge

    Authors: Binjie Wang, Steffi Chern, Ethan Chern, Pengfei Liu

    Abstract: Large language models (LLMs) frequently generate non-factual content, known as hallucinations. Existing retrieval-augmented-based hallucination detection approaches typically address this by framing it as a classification task, evaluating hallucinations based on their consistency with retrieved evidence. However, this approach usually lacks detailed explanations for these evaluations and does not… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  24. arXiv:2407.12258  [pdf, other

    cs.CV

    Facial Affect Recognition based on Multi Architecture Encoder and Feature Fusion for the ABAW7 Challenge

    Authors: Kang Shen, Xuxiong Liu, Boyan Wang, Jun Yao, Xin Liu, Yujie Guan, Yu Wang, Gengchen Li, Xiao Sun

    Abstract: In this paper, we present our approach to addressing the challenges of the 7th ABAW competition. The competition comprises three sub-challenges: Valence Arousal (VA) estimation, Expression (Expr) classification, and Action Unit (AU) detection. To tackle these challenges, we employ state-of-the-art models to extract powerful visual features. Subsequently, a Transformer Encoder is utilized to integr… ▽ More

    Submitted 26 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  25. arXiv:2407.12257  [pdf, other

    cs.CV

    Compound Expression Recognition via Multi Model Ensemble for the ABAW7 Challenge

    Authors: Xuxiong Liu, Kang Shen, Jun Yao, Boyan Wang, Minrui Liu, Liuwei An, Zishun Cui, Weijie Feng, Xiao Sun

    Abstract: Compound Expression Recognition (CER) is vital for effective interpersonal interactions. Human emotional expressions are inherently complex due to the presence of compound expressions, requiring the consideration of both local and global facial cues for accurate judgment. In this paper, we propose an ensemble learning-based solution to address this complexity. Our approach involves training three… ▽ More

    Submitted 26 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.12572 by other authors

  26. arXiv:2407.11705  [pdf, other

    cs.RO eess.SP

    Snail-Radar: A large-scale diverse dataset for the evaluation of 4D-radar-based SLAM systems

    Authors: Jianzhu Huai, Binliang Wang, Yuan Zhuang, Yiwen Chen, Qipeng Li, Yulong Han, Charles Toth

    Abstract: 4D radars are increasingly favored for odometry and mapping of autonomous systems due to their robustness in harsh weather and dynamic environments. Existing datasets, however, often cover limited areas and are typically captured using a single platform. To address this gap, we present a diverse large-scale dataset specifically designed for 4D radar-based localization and mapping. This dataset was… ▽ More

    Submitted 22 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 11 pages, 4 figures, 5 tables

  27. arXiv:2407.10990  [pdf

    cs.CL cs.AI

    MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models

    Authors: Mianxin Liu, Jinru Ding, Jie Xu, Weiguo Hu, Xiaoyang Li, Lifeng Zhu, Zhian Bai, Xiaoming Shi, Benyou Wang, Haitao Song, Pengfei Liu, Xiaofan Zhang, Shanshan Wang, Kang Li, Haofen Wang, Tong Ruan, Xuanjing Huang, Xin Sun, Shaoting Zhang

    Abstract: Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese med… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

    Comments: 25 pages.4 figures

  28. arXiv:2407.10135  [pdf, other

    cs.CV

    FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection

    Authors: Zheng Jiang, Jinqing Zhang, Yanan Zhang, Qingjie Liu, Zhenghui Hu, Baohui Wang, Yunhong Wang

    Abstract: Although multi-view 3D object detection based on the Bird's-Eye-View (BEV) paradigm has garnered widespread attention as an economical and deployment-friendly perception solution for autonomous driving, there is still a performance gap compared to LiDAR-based methods. In recent years, several cross-modal distillation methods have been proposed to transfer beneficial information from teacher models… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  29. arXiv:2407.09522  [pdf, other

    cs.DB cs.AI cs.LG stat.ML

    UQE: A Query Engine for Unstructured Databases

    Authors: Hanjun Dai, Bethany Yixin Wang, Xingchen Wan, Bo Dai, Sherry Yang, Azade Nova, Pengcheng Yin, Phitchaya Mangpo Phothilimthana, Charles Sutton, Dale Schuurmans

    Abstract: Analytics on structured data is a mature field with many successful methods. However, most real world data exists in unstructured form, such as images and conversations. We investigate the potential of Large Language Models (LLMs) to enable unstructured data analytics. In particular, we propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

  30. arXiv:2407.09378  [pdf, other

    cs.LG cs.AI stat.ML

    Graph Neural Network Causal Explanation via Neural Causal Models

    Authors: Arman Behnam, Binghui Wang

    Abstract: Graph neural network (GNN) explainers identify the important subgraph that ensures the prediction for a given graph. Until now, almost all GNN explainers are based on association, which is prone to spurious correlations. We propose {\name}, a GNN causal explainer via causal inference. Our explainer is based on the observation that a graph often consists of a causal underlying subgraph. {\name} inc… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  31. arXiv:2407.09057  [pdf, other

    cs.CV

    PersonificationNet: Making customized subject act like a person

    Authors: Tianchu Guo, Pengyu Li, Biao Wang, Xiansheng Hua

    Abstract: Recently customized generation has significant potential, which uses as few as 3-5 user-provided images to train a model to synthesize new images of a specified subject. Though subsequent applications enhance the flexibility and diversity of customized generation, fine-grained control over the given subject acting like the person's pose is still lack of study. In this paper, we propose a Personifi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  32. arXiv:2407.09011  [pdf, other

    cs.CL cs.AI cs.LG

    One Stone, Four Birds: A Comprehensive Solution for QA System Using Supervised Contrastive Learning

    Authors: Bo Wang, Tsunenori Mine

    Abstract: This paper presents a novel and comprehensive solution to enhance both the robustness and efficiency of question answering (QA) systems through supervised contrastive learning (SCL). Training a high-performance QA system has become straightforward with pre-trained language models, requiring only a small amount of data and simple fine-tuning. However, despite recent advances, existing QA systems st… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 14 pages, under review

  33. arXiv:2407.08990  [pdf, other

    cs.AR cs.AI cs.ET cs.NE

    Dynamic neural network with memristive CIM and CAM for 2D and 3D vision

    Authors: Yue Zhang, Woyu Zhang, Shaocong Wang, Ning Lin, Yifei Yu, Yangu He, Bo Wang, Hao Jiang, Peng Lin, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: The brain is dynamic, associative and efficient. It reconfigures by associating the inputs with past experiences, with fused memory and processing. In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing. We propose a hardware-software co-design, a semantic memory-based dynamic neural network… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: In press

  34. arXiv:2407.08935  [pdf, other

    cs.CR

    Distributed Backdoor Attacks on Federated Graph Learning and Certified Defenses

    Authors: Yuxin Yang, Qiang Li, Jinyuan Jia, Yuan Hong, Binghui Wang

    Abstract: Federated graph learning (FedGL) is an emerging federated learning (FL) framework that extends FL to learn graph data from diverse sources. FL for non-graph data has shown to be vulnerable to backdoor attacks, which inject a shared backdoor trigger into the training data such that the trained backdoored FL model can predict the testing data containing the trigger as the attacker desires. However,… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: This paper is accepted to CCS2024

  35. arXiv:2407.06985  [pdf, other

    cs.AI

    PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods

    Authors: Yiying Wang, Xiaojing Li, Binzhu Wang, Yueyang Zhou, Han Ji, Hong Chen, Jinshi Zhang, Fei Yu, Zewei Zhao, Song Jin, Renji Gong, Wanqing Xu

    Abstract: In domain-specific applications, GPT-4, augmented with precise prompts or Retrieval-Augmented Generation (RAG), shows notable potential but faces the critical tri-lemma of performance, cost, and data privacy. High performance requires sophisticated processing techniques, yet managing multiple agents within a complex workflow often proves costly and challenging. To address this, we introduce the PE… ▽ More

    Submitted 9 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  36. arXiv:2407.05690  [pdf, other

    cs.CL cs.AI

    Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations

    Authors: Bowen Shen, Zheng Lin, Daren Zha, Wei Liu, Jian Luan, Bin Wang, Weiping Wang

    Abstract: Structured pruning fundamentally reduces computational and memory overheads of large language models (LLMs) and offers a feasible solution for end-side LLM deployment. Structurally pruned models remain dense and high-precision, highly compatible with further tuning and compression. However, as the coarse-grained structured pruning poses large damage to the highly interconnected model, achieving a… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Findings of ACL 2024

  37. arXiv:2407.05682  [pdf, other

    cs.CL

    Retrieved In-Context Principles from Previous Mistakes

    Authors: Hao Sun, Yong Jiang, Bo Wang, Yingyan Hou, Yan Zhang, Pengjun Xie, Fei Huang

    Abstract: In-context learning (ICL) has been instrumental in adapting Large Language Models (LLMs) to downstream tasks using correct input-output examples. Recent advances have attempted to improve model performance through principles derived from mistakes, yet these approaches suffer from lack of customization and inadequate error coverage. To address these limitations, we propose Retrieved In-Context Prin… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  38. arXiv:2407.05616  [pdf, other

    cs.CV

    Explainable Image Recognition via Enhanced Slot-attention Based Classifier

    Authors: Bowen Wang, Liangzhi Li, Jiahao Zhang, Yuta Nakashima, Hajime Nagahara

    Abstract: The imperative to comprehend the behaviors of deep learning models is of utmost importance. In this realm, Explainable Artificial Intelligence (XAI) has emerged as a promising avenue, garnering increasing interest in recent years. Despite this, most existing methods primarily depend on gradients or input perturbation, which often fails to embed explanations directly within the model's decision-mak… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 16 pages, 12 figures

  39. arXiv:2407.05530  [pdf, other

    cs.RO cs.AI cs.CV

    This&That: Language-Gesture Controlled Video Generation for Robot Planning

    Authors: Boyang Wang, Nikhil Sridhar, Chao Feng, Mark Van der Merwe, Adam Fishman, Nima Fazeli, Jeong Joon Park

    Abstract: We propose a robot learning method for communicating, planning, and executing a wide range of tasks, dubbed This&That. We achieve robot planning for general tasks by leveraging the power of video generative models trained on internet-scale data containing rich physical and semantic context. In this work, we tackle three fundamental challenges in video-based planning: 1) unambiguous task communicat… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  40. arXiv:2407.05389  [pdf, other

    cs.CV cs.AI

    Image-Conditional Diffusion Transformer for Underwater Image Enhancement

    Authors: Xingyang Nie, Su Pan, Xiaoyu Zhai, Shifei Tao, Fengzhong Qu, Biao Wang, Huilin Ge, Guojie Xiao

    Abstract: Underwater image enhancement (UIE) has attracted much attention owing to its importance for underwater operation and marine engineering. Motivated by the recent advance in generative models, we propose a novel UIE method based on image-conditional diffusion transformer (ICDT). Our method takes the degraded underwater image as the conditional input and converts it into latent space where ICDT is ap… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  41. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  42. arXiv:2407.04285  [pdf, other

    cs.LG cs.AI

    Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling

    Authors: Jiawei Xu, Rui Yang, Feng Luo, Meng Fang, Baoxiang Wang, Lei Han

    Abstract: Learning policies from offline datasets through offline reinforcement learning (RL) holds promise for scaling data-driven decision-making and avoiding unsafe and costly online interactions. However, real-world data collected from sensors or humans often contains noise and errors, posing a significant challenge for existing offline RL methods. Our study indicates that traditional offline RL methods… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  43. arXiv:2407.04064  [pdf, other

    cs.RO

    Collision Avoidance for Multiple UAVs in Unknown Scenarios with Causal Representation Disentanglement

    Authors: Jiafan Zhuang, Zihao Xia, Gaofei Han, Boxi Wang, Wenji Li, Dongliang Wang, Zhifeng Hao, Ruichu Cai, Zhun Fan

    Abstract: Deep reinforcement learning (DRL) has achieved remarkable progress in online path planning tasks for multi-UAV systems. However, existing DRL-based methods often suffer from performance degradation when tackling unseen scenarios, since the non-causal factors in visual representations adversely affect policy learning. To address this issue, we propose a novel representation learning approach, \ie,… ▽ More

    Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  44. arXiv:2407.04056  [pdf, other

    cs.RO

    Robust Policy Learning for Multi-UAV Collision Avoidance with Causal Feature Selection

    Authors: Jiafan Zhuang, Gaofei Han, Zihao Xia, Boxi Wang, Wenji Li, Dongliang Wang, Zhifeng Hao, Ruichu Cai, Zhun Fan

    Abstract: In unseen and complex outdoor environments, collision avoidance navigation for unmanned aerial vehicle (UAV) swarms presents a challenging problem. It requires UAVs to navigate through various obstacles and complex backgrounds. Existing collision avoidance navigation methods based on deep reinforcement learning show promising performance but suffer from poor generalization abilities, resulting in… ▽ More

    Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  45. arXiv:2407.03320  [pdf, other

    cs.CV cs.CL

    InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

    Authors: Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao , et al. (2 additional authors not shown)

    Abstract: We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. Th… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Technical Report. https://github.com/InternLM/InternLM-XComposer

  46. arXiv:2407.02886  [pdf, other

    cs.CR

    A Wolf in Sheep's Clothing: Practical Black-box Adversarial Attacks for Evading Learning-based Windows Malware Detection in the Wild

    Authors: Xiang Ling, Zhiyu Wu, Bin Wang, Wei Deng, Jingzheng Wu, Shouling Ji, Tianyue Luo, Yanjun Wu

    Abstract: Given the remarkable achievements of existing learning-based malware detection in both academia and industry, this paper presents MalGuise, a practical black-box adversarial attack framework that evaluates the security risks of existing learning-based Windows malware detection systems under the black-box setting. MalGuise first employs a novel semantics-preserving transformation of call-based redi… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by 33rd USENIX Security Symposium 2024

  47. arXiv:2407.02610  [pdf, other

    cs.LG cs.DC

    Towards Federated Learning with On-device Training and Communication in 8-bit Floating Point

    Authors: Bokun Wang, Axel Berg, Durmus Alp Emre Acar, Chuteng Zhou

    Abstract: Recent work has shown that 8-bit floating point (FP8) can be used for efficiently training neural networks with reduced computational overhead compared to training in FP32/FP16. In this work, we investigate the use of FP8 training in a federated learning context. This brings not only the usual benefits of FP8 which are desirable for on-device training at the edge, but also reduces client-server co… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  48. arXiv:2407.02485  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

    Authors: Yue Yu, Wei Ping, Zihan Liu, Boxin Wang, Jiaxuan You, Chao Zhang, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG). In this work, we propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG. In particular, the instruction-tuned LLMs work surprisingly well by adding a small fraction o… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  49. arXiv:2407.02345  [pdf, other

    cs.CL

    MORPHEUS: Modeling Role from Personalized Dialogue History by Exploring and Utilizing Latent Space

    Authors: Yihong Tang, Bo Wang, Dongming Zhao, Xiaojia Jin, Jijun Zhang, Ruifang He, Yuexian Hou

    Abstract: Personalized Dialogue Generation (PDG) aims to create coherent responses according to roles or personas. Traditional PDG relies on external role data, which can be scarce and raise privacy concerns. Approaches address these issues by extracting role information from dialogue history, which often fail to generically model roles in continuous space. To overcome these limitations, we introduce a nove… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  50. arXiv:2407.01897  [pdf, other

    cs.CL

    Proposal Report for the 2nd SciCAP Competition 2024

    Authors: Pengpeng Li, Tingmin Li, Jingyuan Wang, Boyuan Wang, Yang Yang

    Abstract: In this paper, we propose a method for document summarization using auxiliary information. This approach effectively summarizes descriptions related to specific images, tables, and appendices within lengthy texts. Our experiments demonstrate that leveraging high-quality OCR data and initially extracted information from the original text enables efficient summarization of the content related to des… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.