Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 346 results for author: Luo, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02919  [pdf, other

    cs.CV

    HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts

    Authors: Xinyu Liu, Yingqing He, Lanqing Guo, Xiang Li, Bu Jin, Peng Li, Yan Li, Chi-Min Chan, Qifeng Chen, Wei Xue, Wenhan Luo, Qingfeng Liu, Yike Guo

    Abstract: The potential for higher-resolution image generation using pretrained diffusion models is immense, yet these models often struggle with issues of object repetition and structural artifacts especially when scaling to 4K resolution and higher. We figure out that the problem is caused by that, a single prompt for the generation of multiple scales provides insufficient efficacy. In response, we propos… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2409.02370  [pdf, other

    cs.CL cs.AI

    Do Large Language Models Possess Sensitive to Sentiment?

    Authors: Yang Liu, Xichou Zhu, Zhou Shen, Yi Liu, Min Li, Yujun Chen, Benzi John, Zhenzhen Ma, Tao Hu, Zhiyang Xu, Wei Luo, Junhui Wang

    Abstract: Large Language Models (LLMs) have recently displayed their extraordinary capabilities in language understanding. However, how to comprehensively assess the sentiment capabilities of LLMs continues to be a challenge. This paper investigates the ability of LLMs to detect and react to sentiment in text modal. As the integration of LLMs into diverse applications is on the rise, it becomes highly criti… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 10 pages, 2 figures

  3. arXiv:2408.12830  [pdf, other

    cs.LG stat.ML

    SAMBO-RL: Shifts-aware Model-based Offline Reinforcement Learning

    Authors: Wang Luo, Haoran Li, Zicheng Zhang, Congying Han, Jiayu Lv, Tiande Guo

    Abstract: Model-based Offline Reinforcement Learning trains policies based on offline datasets and model dynamics, without direct real-world environment interactions. However, this method is inherently challenged by distribution shift. Previous approaches have primarily focused on tackling this issue directly leveraging off-policy mechanisms and heuristic uncertainty in model dynamics, but they resulted in… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  4. arXiv:2408.12822  [pdf, other

    cs.RO eess.SY

    Courteous MPC for Autonomous Driving with CBF-inspired Risk Assessment

    Authors: Yanze Zhang, Yiwei Lyu, Sude E. Demir, Xingyu Zhou, Yupeng Yang, Junmin Wang, Wenhao Luo

    Abstract: With more autonomous vehicles (AVs) sharing roadways with human-driven vehicles (HVs), ensuring safe and courteous maneuvers that respect HVs' behavior becomes increasingly important. To promote both safety and courtesy in AV's behavior, an extension of Control Barrier Functions (CBFs)-inspired risk evaluation framework is proposed in this paper by considering both noisy observed positions and vel… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 7 pages, accepted to ITSC 2024

  5. arXiv:2408.12191  [pdf, other

    cs.CV

    Transientangelo: Few-Viewpoint Surface Reconstruction Using Single-Photon Lidar

    Authors: Weihan Luo, Anagh Malik, David B. Lindell

    Abstract: We consider the problem of few-viewpoint 3D surface reconstruction using raw measurements from a lidar system. Lidar captures 3D scene geometry by emitting pulses of light to a target and recording the speed-of-light time delay of the reflected light. However, conventional lidar systems do not output the raw, captured waveforms of backscattered light; instead, they pre-process these data into a 3D… ▽ More

    Submitted 23 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: https://weihan1.github.io/transientangelo/

  6. arXiv:2408.11396  [pdf, other

    cs.CL

    MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing

    Authors: Hao Zhou, Zhijun Wang, Shujian Huang, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Weihua Luo, Jiajun Chen

    Abstract: Large Language Models (LLMs) are often English-centric due to the disproportionate distribution of languages in their pre-training data. Enhancing non-English language capabilities through post-pretraining often results in catastrophic forgetting of the ability of original languages. Previous methods either achieve good expansion with severe forgetting or slight forgetting with poor expansion, ind… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  7. arXiv:2408.10943  [pdf, other

    cs.CL

    SysBench: Can Large Language Models Follow System Messages?

    Authors: Yanzhao Qin, Tao Zhang, Tao Zhang, Yanjun Shen, Wenjing Luo, Haoze Sun, Yan Zhang, Yujing Qiao, Weipeng Chen, Zenan Zhou, Wentao Zhang, Bin Cui

    Abstract: Large Language Models (LLMs) have become instrumental across various applications, with the customization of these models to specific scenarios becoming increasingly critical. System message, a fundamental component of LLMs, is consist of carefully crafted instructions that guide the behavior of model to meet intended goals. Despite the recognized potential of system messages to optimize AI-driven… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  8. arXiv:2408.09108  [pdf, other

    cs.AI cs.CV

    Temporal Reversed Training for Spiking Neural Networks with Generalized Spatio-Temporal Representation

    Authors: Lin Zuo, Yongqi Ding, Wenwei Luo, Mengmeng Jing, Xianlong Tian, Kunshan Yang

    Abstract: Spiking neural networks (SNNs) have received widespread attention as an ultra-low energy computing paradigm. Recent studies have focused on improving the feature extraction capability of SNNs, but they suffer from inefficient inference and suboptimal performance. In this paper, we propose a simple yet effective temporal reversed training (TRT) method to optimize the spatio-temporal performance of… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 15 pages, 8 figures

  9. arXiv:2408.07500  [pdf, other

    cs.CV

    Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

    Authors: Shizhou Zhang, Wenlong Luo, De Cheng, Qingchun Yang, Lingyan Ran, Yinghui Xing, Yanning Zhang

    Abstract: In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) L… ▽ More

    Submitted 2 September, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: Published at ECCV 2024

  10. arXiv:2408.06658  [pdf, other

    cs.SI

    ComGPT: Detecting Local Community Structure with Large Language Models

    Authors: Li Ni, Haowen Shen, Lin Mu, Yiwen Zhang, Wenjian Luo

    Abstract: Large language models (LLMs), like GPT, have demonstrated the ability to understand graph structures and have achieved excellent performance in various graph reasoning tasks such as node classification. So far, how to leverage LLMs to better detect local communities remains underexplored. Local community detection algorithms based on seed expansion often face a seed-dependent problem, community di… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  11. arXiv:2408.06087  [pdf, other

    cs.CL cs.AI cs.LG

    Building Decision Making Models Through Language Model Regime

    Authors: Yu Zhang, Haoxiang Liu, Feijun Jiang, Weihua Luo, Kaifu Zhang

    Abstract: We propose a novel approach for decision making problems leveraging the generalization capabilities of large language models (LLMs). Traditional methods such as expert systems, planning algorithms, and reinforcement learning often exhibit limited generalization, typically requiring the training of new models for each unique task. In contrast, LLMs demonstrate remarkable success in generalizing acr… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  12. arXiv:2408.06047  [pdf, other

    cs.CV

    BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

    Authors: Xuanpu Zhang, Dan Song, Pengxin Zhan, Qingguo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Anan Liu

    Abstract: Image-based virtual try-on is an increasingly popular and important task to generate realistic try-on images of specific person. Existing methods always employ an accurate mask to remove the original garment in the source image, thus achieving realistic synthesized images in simple and conventional try-on scenarios based on powerful diffusion model. Therefore, acquiring suitable mask is vital to t… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  13. arXiv:2408.03220  [pdf, other

    cs.LG cs.DC

    Masked Random Noise for Communication Efficient Federaetd Learning

    Authors: Shiwei Li, Yingyi Cheng, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Dugang Liu, Xiuqiang He, and Ruixuan Li

    Abstract: Federated learning is a promising distributed training paradigm that effectively safeguards data privacy. However, it may involve significant communication costs, which hinders training efficiency. In this paper, we aim to enhance communication efficiency from a new perspective. Specifically, we request the distributed clients to find optimal model updates relative to global model parameters withi… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by MM 2024

  14. arXiv:2408.03215  [pdf, other

    cs.LG cs.DC

    FedBAT: Communication-Efficient Federated Learning via Learnable Binarization

    Authors: Shiwei Li, Wenchao Xu, Haozhao Wang, Xing Tang, Yining Qi, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li

    Abstract: Federated learning is a promising distributed machine learning paradigm that can effectively exploit large-scale data without exposing users' privacy. However, it may incur significant communication overhead, thereby potentially impairing the training efficiency. To address this challenge, numerous studies suggest binarizing the model updates. Nonetheless, traditional methods usually binarize mode… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by ICML 2024

  15. arXiv:2408.01803  [pdf, other

    cs.LG cs.CL

    STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

    Authors: Peijie Dong, Lujun Li, Dayou Du, Yuhan Chen, Zhenheng Tang, Qiang Wang, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo, Xiaowen Chu

    Abstract: In this paper, we present STBLLM, the first structural binarization framework for compressing Large Language Models (LLMs) to less than 1-bit precision. LLMs have achieved remarkable performance, but their heavy memory requirements have hindered widespread adoption, particularly on resource-constrained devices. Binarization, which quantifies weights to a mere 1-bit, achieves a milestone in increas… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  16. arXiv:2408.01122  [pdf, other

    cs.CL

    CFBench: A Comprehensive Constraints-Following Benchmark for LLMs

    Authors: Tao Zhang, Yanjun Shen, Wenjing Luo, Yan Zhang, Hao Liang, Tao Zhang, Fan Yang, Mingan Lin, Yujing Qiao, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

    Abstract: The adeptness of Large Language Models (LLMs) in comprehending and following natural language instructions is critical for their deployment in sophisticated real-world applications. Existing evaluations mainly focus on fragmented constraints or narrow scenarios, but they overlook the comprehensiveness and authenticity of constraints from the user's perspective. To bridge this gap, we propose CFBen… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 15 pages, 10 figures

  17. arXiv:2407.20962  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

    Authors: Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Ruibin Yuan, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen, Shanghang Zhang, Qifeng Liu, Yike Guo

    Abstract: Massive multi-modality datasets play a significant role in facilitating the success of large video-language models. However, current video-language datasets primarily provide text descriptions for visual frames, considering audio to be weakly related information. They usually overlook exploring the potential of inherent audio-visual correlation, leading to monotonous annotation within each modalit… ▽ More

    Submitted 6 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: 15 Pages. Dataset report

  18. arXiv:2407.17854  [pdf, other

    cs.AI cs.CL cs.MM

    Shapley Value-based Contrastive Alignment for Multimodal Information Extraction

    Authors: Wen Luo, Yu Xia, Shen Tianshu, Sujian Li

    Abstract: The rise of social media and the exponential growth of multimodal communication necessitates advanced techniques for Multimodal Information Extraction (MIE). However, existing methodologies primarily rely on direct Image-Text interactions, a paradigm that often faces significant challenges due to semantic and modality gaps between images and text. In this paper, we introduce a new paradigm of Imag… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted at ACM Multimedia 2024

  19. arXiv:2407.17097  [pdf, other

    cs.LG cs.AI

    Towards Robust Knowledge Tracing Models via k-Sparse Attention

    Authors: Shuyan Huang, Zitao Liu, Xiangyu Zhao, Weiqi Luo, Jian Weng

    Abstract: Knowledge tracing (KT) is the problem of predicting students' future performance based on their historical interaction sequences. With the advanced capability of capturing contextual long-term dependency, attention mechanism becomes one of the essential components in many deep learning based KT (DLKT) models. In spite of the impressive performance achieved by these attentional DLKT models, many of… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted at SIGIR'2023 (revised version with additional results)

  20. arXiv:2407.13096  [pdf, other

    cs.PF

    DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information

    Authors: Qiang Wang, Laiyi Li, Weile Luo, Yijia Zhang, Bingqiang Wang

    Abstract: Increased reliance on graphics processing units (GPUs) for high-intensity computing tasks raises challenges regarding energy consumption. To address this issue, dynamic voltage and frequency scaling (DVFS) has emerged as a promising technique for conserving energy while maintaining the quality of service (QoS) of GPU applications. However, existing solutions using DVFS are hindered by inefficiency… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  21. arXiv:2407.11162  [pdf, other

    cs.CV

    Integrating Amortized Inference with Diffusion Models for Learning Clean Distribution from Corrupted Images

    Authors: Yifei Wang, Weimin Bai, Weijian Luo, Wenzheng Chen, He Sun

    Abstract: Diffusion models (DMs) have emerged as powerful generative models for solving inverse problems, offering a good approximation of prior distributions of real-world image data. Typically, diffusion models rely on large-scale clean signals to accurately learn the score functions of ground truth clean image distributions. However, such a requirement for large amounts of clean data is often impractical… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  22. arXiv:2407.05563  [pdf, other

    cs.CL

    LLMBox: A Comprehensive Library for Large Language Models

    Authors: Tianyi Tang, Yiwen Hu, Bingqian Li, Wenyang Luo, Zijing Qin, Haoxiang Sun, Jiapeng Wang, Shiyi Xu, Xiaoxue Cheng, Geyang Guo, Han Peng, Bowen Zheng, Yiru Tang, Yingqian Min, Yushuo Chen, Jie Chen, Yuanqian Zhao, Luran Ding, Yuhao Wang, Zican Dong, Chunxuan Xia, Junyi Li, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets,… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024 Demo

  23. arXiv:2407.05396  [pdf, other

    cs.CR cs.AI

    Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense

    Authors: Qi Zhou, Zipeng Ye, Yubo Tang, Wenjian Luo, Yuhui Shi, Yan Jia

    Abstract: Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition. However, DNN model is fragile to backdoor attack. A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction, which causes serious security issues in applications. It is challenging for current defenses to eliminate the backdoor effective… ▽ More

    Submitted 14 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures

  24. arXiv:2407.03569  [pdf, other

    cs.RO

    Safety-Critical Control with Uncertainty Quantification using Adaptive Conformal Prediction

    Authors: Hao Zhou, Yanze Zhang, Wenhao Luo

    Abstract: Safety assurance is critical in the planning and control of robotic systems. For robots operating in the real world, the safety-critical design often needs to explicitly address uncertainties and the pre-computed guarantees often rely on the assumption of the particular distribution of the uncertainty. However, it is difficult to characterize the actual uncertainty distribution beforehand and thus… ▽ More

    Submitted 8 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 7 pages, accepted to ACC 2024

  25. arXiv:2407.00955  [pdf, other

    cs.IT cs.AI eess.SP

    Task-oriented Over-the-air Computation for Edge-device Co-inference with Balanced Classification Accuracy

    Authors: Xiang Jiao, Dingzhu Wen, Guangxu Zhu, Wei Jiang, Wu Luo, Yuanming Shi

    Abstract: Edge-device co-inference, which concerns the cooperation between edge devices and an edge server for completing inference tasks over wireless networks, has been a promising technique for enabling various kinds of intelligent services at the network edge, e.g., auto-driving. In this paradigm, the concerned design objective of the network shifts from the traditional communication throughput to the e… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: This paper was accepted by IEEE Transactions on Vehicular Technology on June 30, 2024

  26. arXiv:2406.14548  [pdf, other

    cs.LG cs.CV

    Consistency Models Made Easy

    Authors: Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, J. Zico Kolter

    Abstract: Consistency models (CMs) are an emerging class of generative models that offer faster sampling than traditional diffusion models. CMs enforce that all points along a sampling trajectory are mapped to the same initial point. But this target leads to resource-intensive training: for example, as of 2024, training a SoTA CM on CIFAR-10 takes one week on 8 GPUs. In this work, we propose an alternative… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  27. arXiv:2406.13007  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Night Photography Rendering

    Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, Jingyuan Xiao , et al. (25 additional authors not shown)

    Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures

  28. arXiv:2406.12802  [pdf, other

    cs.RO

    Decentralized Multi-Robot Line-of-Sight Connectivity Maintenance under Uncertainty

    Authors: Yupeng Yang, Yiwei Lyu, Yanze Zhang, Sha Yi, Wenhao Luo

    Abstract: In this paper, we propose a novel decentralized control method to maintain Line-of-Sight connectivity for multi-robot networks in the presence of Guassian-distributed localization uncertainty. In contrast to most existing work that assumes perfect positional information about robots or enforces overly restrictive rigid formation against uncertainty, our method enables robots to preserve Line-of-Si… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by RSS 2024

  29. arXiv:2406.11507  [pdf, other

    cs.CV

    Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection

    Authors: Haiming Yao, Yunkang Cao, Wei Luo, Weihang Zhang, Wenyong Yu, Weiming Shen

    Abstract: Image anomaly detection plays a pivotal role in industrial inspection. Traditional approaches often demand distinct models for specific categories, resulting in substantial deployment costs. This raises concerns about multi-class anomaly detection, where a unified model is developed for multiple classes. However, applying conventional methods, particularly reconstruction-based models, directly to… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Industrial Informatics

  30. arXiv:2406.10985  [pdf, other

    cs.CL

    Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

    Authors: Weiyao Luo, Suncong Zheng, Heming Xia, Weikang Wang, Yan Lei, Tianyu Liu, Shuang Chen, Zhifang Sui

    Abstract: Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts due to they discard some information to reduce computational overhead. In this work, we propose a simple yet effective method to enable LLMs to take a deep breath… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  31. arXiv:2406.07648  [pdf, other

    cs.CV

    M-LRM: Multi-view Large Reconstruction Model

    Authors: Mengfei Li, Xiaoxiao Long, Yixun Liang, Weiyu Li, Yuan Liu, Peng Li, Xiaowei Chi, Xingqun Qi, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo

    Abstract: Despite recent advancements in the Large Reconstruction Model (LRM) demonstrating impressive results, when extending its input from single image to multiple images, it exhibits inefficiencies, subpar geometric and texture quality, as well as slower convergence speed than expected. It is attributed to that, LRM formulates 3D reconstruction as a naive images-to-3D translation problem, ignoring the… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  32. arXiv:2406.07333  [pdf, other

    cs.CV

    Global-Regularized Neighborhood Regression for Efficient Zero-Shot Texture Anomaly Detection

    Authors: Haiming Yao, Wei Luo, Yunkang Cao, Yiheng Zhang, Wenyong Yu, Weiming Shen

    Abstract: Texture surface anomaly detection finds widespread applications in industrial settings. However, existing methods often necessitate gathering numerous samples for model training. Moreover, they predominantly operate within a close-set detection framework, limiting their ability to identify anomalies beyond the training dataset. To tackle these challenges, this paper introduces a novel zero-shot te… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: SUBMISSION TO IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS

  33. arXiv:2406.07115  [pdf, other

    cs.CL cs.AI cs.LG

    Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

    Authors: Sijia Chen, Yibo Wang, Yi-Feng Wu, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Lijun Zhang

    Abstract: Tool-augmented large language models (LLMs) leverage tools, often in the form of APIs, to enhance their reasoning capabilities on complex tasks, thus taking on the role of intelligent agents interacting with the real world. The recently introduced ToolLLaMA model by Qin et al. [2024] utilizes the depth-first search-based decision tree (DFSDT) method for reasoning with $16000+$ real-world APIs, whi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  34. arXiv:2406.07070  [pdf, other

    cs.CL

    HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation

    Authors: Wen Luo, Tianshu Shen, Wei Li, Guangyue Peng, Richeng Xuan, Houfeng Wang, Xi Yang

    Abstract: Large Language Models (LLMs) have significantly advanced the field of Natural Language Processing (NLP), achieving remarkable performance across diverse tasks and enabling widespread real-world applications. However, LLMs are prone to hallucination, generating content that either conflicts with established knowledge or is unfaithful to the original sources. Existing hallucination benchmarks primar… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  35. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin Jin, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, Jingjing Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  36. arXiv:2406.05343  [pdf, other

    cs.AI cs.CL

    M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark

    Authors: Wei Song, Yadong Li, Jianhua Xu, Guowei Wu, Lingfeng Ming, Kexin Yi, Weihua Luo, Houyi Li, Yi Du, Fangda Guo, Kaicheng Yu

    Abstract: As recent multi-modality large language models (MLLMs) have shown formidable proficiency on various complex tasks, there has been increasing attention on debating whether these models could eventually mirror human intelligence. However, existing benchmarks mainly focus on evaluating solely on task performance, such as the accuracy of identifying the attribute of an object. Combining well-developed… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  37. arXiv:2406.03496  [pdf, other

    cs.CL cs.AI cs.LG

    Wings: Learning Multimodal LLMs without Text-only Forgetting

    Authors: Yi-Kai Zhang, Shiyin Lu, Yang Li, Yanqing Ma, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

    Abstract: Multimodal large language models (MLLMs), initiated with a trained LLM, first align images with text and then fine-tune on multimodal mixed inputs. However, the MLLM catastrophically forgets the text-only instructions, which do not include images and can be addressed within the initial LLM. In this paper, we present Wings, a novel MLLM that excels in both text-only dialogues and multimodal compreh… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  38. arXiv:2406.03035  [pdf, other

    cs.CV

    Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control

    Authors: Jingyun Xue, Hongfa Wang, Qi Tian, Yue Ma, Andong Wang, Zhiyuan Zhao, Shaobo Min, Wenzhe Zhao, Kaihao Zhang, Heung-Yeung Shum, Wei Liu, Mengyang Liu, Wenhan Luo

    Abstract: Pose-controllable character video generation is in high demand with extensive applications for fields such as automatic advertising and content creation on social media platforms. While existing character image animation methods using pose sequences and reference images have shown promising performance, they tend to struggle with incoherent animation in complex scenarios, such as multiple characte… ▽ More

    Submitted 12 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  39. arXiv:2406.02539  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Parrot: Multilingual Visual Instruction Tuning

    Authors: Hai-Long Sun, Da-Wei Zhou, Yang Li, Shiyin Lu, Chao Yi, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

    Abstract: The rapid development of Multimodal Large Language Models (MLLMs) like GPT-4V has marked a significant step towards artificial general intelligence. Existing methods mainly focus on aligning vision encoders with LLMs through supervised fine-tuning (SFT) to endow LLMs with multimodal abilities, making MLLMs' inherent ability to react to multiple languages progressively deteriorate as the training p… ▽ More

    Submitted 11 August, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Code is available at: https://github.com/AIDC-AI/Parrot

  40. arXiv:2405.20797  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Ovis: Structural Embedding Alignment for Multimodal Large Language Model

    Authors: Shiyin Lu, Yang Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Han-Jia Ye

    Abstract: Current Multimodal Large Language Models (MLLMs) typically integrate a pre-trained LLM with another pre-trained vision transformer through a connector, such as an MLP, endowing the LLM with visual capabilities. However, the misalignment between two embedding strategies in MLLMs -- the structural textual embeddings based on an embedding look-up table and the continuous embeddings generated directly… ▽ More

    Submitted 17 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  41. arXiv:2405.20773  [pdf, other

    cs.CR cs.AI

    Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character

    Authors: Siyuan Ma, Weidi Luo, Yu Wang, Xiaogeng Liu

    Abstract: With the advent and widespread deployment of Multimodal Large Language Models (MLLMs), ensuring their safety has become increasingly critical. To achieve this objective, it requires us to proactively discover the vulnerability of MLLMs by exploring the attack methods. Thus, structure-based jailbreak attacks, where harmful semantic content is embedded within images, have been proposed to mislead th… ▽ More

    Submitted 12 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  42. arXiv:2405.16874  [pdf, other

    cs.CV

    CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild

    Authors: Xingqun Qi, Hengyuan Zhang, Yatian Wang, Jiahao Pan, Chen Liu, Peng Li, Xiaowei Chi, Mengfei Li, Qixun Zhang, Wei Xue, Shanghang Zhang, Wenhan Luo, Qifeng Liu, Yike Guo

    Abstract: Deriving co-speech 3D gestures has seen tremendous progress in virtual avatar animation. Yet, the existing methods often produce stiff and unreasonable gestures with unseen human speech inputs due to the limited 3D speech-gesture data. In this paper, we propose CoCoGesture, a novel framework enabling vivid and diverse gesture synthesis from unseen human speech prompts. Our key insight is built upo… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: The dataset will be released as soon as possible

  43. arXiv:2405.15831  [pdf, other

    eess.SY cs.AI cs.LG

    Transmission Interface Power Flow Adjustment: A Deep Reinforcement Learning Approach based on Multi-task Attribution Map

    Authors: Shunyu Liu, Wei Luo, Yanzhen Zhou, Kaixuan Chen, Quan Zhang, Huating Xu, Qinglai Guo, Mingli Song

    Abstract: Transmission interface power flow adjustment is a critical measure to ensure the security and economy operation of power systems. However, conventional model-based adjustment schemes are limited by the increasing variations and uncertainties occur in power systems, where the adjustment problems of different transmission interfaces are often treated as several independent tasks, ignoring their coup… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Transactions on Power Systems

  44. arXiv:2405.13152  [pdf, other

    cs.CV cs.AI

    Enhancing Interaction Modeling with Agent Selection and Physical Methods for Trajectory Prediction

    Authors: Shiji Huang, Lei Ye, Min Chen, Wenhai Luo, Chenqi Xu, Deyuan Liang, Dihong Wang

    Abstract: In this study, we address the limitations inherent in most existing vehicle trajectory prediction methodologies that indiscriminately incorporate all agents within a predetermined proximity when accounting for inter-agent interactions. These approaches commonly employ attention-based architecture or graph neural networks for encoding interactions, which introduces three challenges: (i) The indiscr… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: code:https://github.com/kkk00714/ASPILin

  45. arXiv:2405.12661  [pdf, other

    cs.CV

    EmoEdit: Evoking Emotions through Image Manipulation

    Authors: Jingyuan Yang, Jiawei Feng, Weibin Luo, Dani Lischinski, Daniel Cohen-Or, Hui Huang

    Abstract: Affective Image Manipulation (AIM) seeks to modify user-provided images to evoke specific emotional responses. This task is inherently complex due to its twofold objective: significantly evoking the intended emotion, while preserving the original image composition. Existing AIM methods primarily adjust color and style, often failing to elicit precise and profound emotional shifts. Drawing on psych… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  46. arXiv:2405.11616  [pdf, other

    cs.CV

    Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

    Authors: Peng Li, Yuan Liu, Xiaoxiao Long, Feihu Zhang, Cheng Lin, Mengfei Li, Xingqun Qi, Shanghang Zhang, Wenhan Luo, Ping Tan, Wenping Wang, Qifeng Liu, Yike Guo

    Abstract: In this paper, we introduce Era3D, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image. Despite significant advancements in multiview generation, existing methods still suffer from camera prior mismatch, inefficacy, and low resolution, resulting in poor-quality multiview images. Specifically, these methods assume that the input images should… ▽ More

    Submitted 29 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

  47. arXiv:2405.11273  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

    Authors: Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, Min Zhang

    Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities. To ad… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 22 pages, 13 figures. Project Website: https://uni-moe.github.io/. Working in progress

  48. arXiv:2405.09786  [pdf, other

    cs.LG cs.CR

    IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency

    Authors: Linshan Hou, Ruili Feng, Zhongyun Hua, Wei Luo, Leo Yu Zhang, Yiming Li

    Abstract: Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries can maliciously trigger model misclassifications by implanting a hidden backdoor during model training. This paper proposes a simple yet effective input-level backdoor detection (dubbed IBD-PSC) as a `firewall' to filter out malicious testing images. Our method is motivated by an intriguing phenomenon, i.e., paramete… ▽ More

    Submitted 2 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024, 31 pages

  49. arXiv:2405.05695  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost

    Authors: Yuan Gao, Weizhong Zhang, Wenhan Luo, Lin Ma, Jin-Gang Yu, Gui-Song Xia, Jiayi Ma

    Abstract: We aim at exploiting additional auxiliary labels from an independent (auxiliary) task to boost the primary task performance which we focus on, while preserving a single task inference cost of the primary task. While most existing auxiliary learning methods are optimization-based relying on loss weights/gradients manipulation, our method is architecture-based with a flexible asymmetric structure fo… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted to ICLR 2024

    Journal ref: International Conference on Learning Representations (ICLR), 2024

  50. arXiv:2405.04795  [pdf, other

    cs.LG

    Variational Schrödinger Diffusion Models

    Authors: Wei Deng, Weijian Luo, Yixin Tan, Marin Biloš, Yu Chen, Yuriy Nevmyvaka, Ricky T. Q. Chen

    Abstract: Schrödinger bridge (SB) has emerged as the go-to method for optimizing transportation plans in diffusion models. However, SB requires estimating the intractable forward score functions, inevitably resulting in the costly implicit training loss based on simulated trajectories. To improve the scalability while preserving efficient transportation plans, we leverage variational inference to linearize… ▽ More

    Submitted 19 June, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: ICML 2024