Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,082 results for author: Sun, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04297  [pdf, other

    cs.CR

    HuntFUZZ: Enhancing Error Handling Testing through Clustering Based Fuzzing

    Authors: Jin Wei, Ping Chen, Jun Dai, Xiaoyan Sun, Zhihao Zhang, Chang Xu, Yi Wanga

    Abstract: Testing a program's capability to effectively handling errors is a significant challenge, given that program errors are relatively uncommon. To solve this, Software Fault Injection (SFI)-based fuzzing integrates SFI and traditional fuzzing, injecting and triggering errors for testing (error handling) code. However, we observe that current SFI-based fuzzing approaches have overlooked the correlatio… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2407.04294  [pdf, other

    cs.CR

    SQLaser: Detecting DBMS Logic Bugs with Clause-Guided Fuzzing

    Authors: Jin Wei, Ping Chen, Kangjie Lu, Jun Dai, Xiaoyan Sun

    Abstract: Database Management Systems (DBMSs) are vital components in modern data-driven systems. Their complexity often leads to logic bugs, which are implementation errors within the DBMSs that can lead to incorrect query results, data exposure, unauthorized access, etc., without necessarily causing visible system failures. Existing detection employs two strategies: rule-based bug detection and coverage-g… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  3. arXiv:2407.03942  [pdf, other

    cs.AI cs.CL cs.HC

    Diverse and Fine-Grained Instruction-Following Ability Exploration with Synthetic Data

    Authors: Zihui Gu, Xingwu Sun, Fengzong Lian, Zhanhui Kang, Cheng-Zhong Xu, Ju Fan

    Abstract: Instruction-following is particularly crucial for large language models (LLMs) to support diverse user requests. While existing work has made progress in aligning LLMs with human preferences, evaluating their capabilities on instruction following remains a challenge due to complexity and diversity of real-world user instructions. While existing evaluation methods focus on general skills, they suff… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Journal ref: AAAI 2024

  4. arXiv:2407.03568  [pdf, other

    cs.SI cs.IR

    When LLM Meets Hypergraph: A Sociological Analysis on Personality via Online Social Networks

    Authors: Zhiyao Shu, Xiangguo Sun, Hong Cheng

    Abstract: Individual personalities significantly influence our perceptions, decisions, and social interactions, which is particularly crucial for gaining insights into human behavior patterns in online social network analysis. Many psychological studies have observed that personalities are strongly reflected in their social behaviors and social environments. In light of these problems, this paper proposes a… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  5. Small Aerial Target Detection for Airborne Infrared Detection Systems using LightGBM and Trajectory Constraints

    Authors: Xiaoliang Sun, Liangchao Guo, Wenlong Zhang, Zi Wang, Qifeng Yu

    Abstract: Factors, such as rapid relative motion, clutter background, etc., make robust small aerial target detection for airborne infrared detection systems a challenge. Existing methods are facing difficulties when dealing with such cases. We consider that a continuous and smooth trajectory is critical in boosting small infrared aerial target detection performance. A simple and effective small aerial targ… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 15 pages,10 figures

    Journal ref: IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 14 9959-9973 2021

  6. arXiv:2407.00386  [pdf, other

    cs.NE cs.AI

    Multi-task multi-constraint differential evolution with elite-guided knowledge transfer for coal mine integrated energy system dispatching

    Authors: Canyun Dai, Xiaoyan Sun, Hejuan Hu, Wei Song, Yong Zhang, Dunwei Gong

    Abstract: The dispatch optimization of coal mine integrated energy system is challenging due to high dimensionality, strong coupling constraints, and multiobjective. Existing constrained multiobjective evolutionary algorithms struggle with locating multiple small and irregular feasible regions, making them inaplicable to this problem. To address this issue, we here develop a multitask evolutionary algorithm… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  7. arXiv:2406.17431  [pdf, other

    cs.SE

    A Large-scale Investigation of Semantically Incompatible APIs behind Compatibility Issues in Android Apps

    Authors: Shidong Pan, Tianchen Guo, Lihong Zhang, Pei Liu, Zhenchang Xing, Xiaoyu Sun

    Abstract: Application Programming Interface (API) incompatibility is a long-standing issue in Android application development. The rapid evolution of Android APIs results in a significant number of API additions, removals, and changes between adjacent versions. Unfortunately, this high frequency of alterations may lead to compatibility issues, often without adequate notification to developers regarding thes… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  8. arXiv:2406.17156  [pdf, other

    cs.GR cs.HC

    Toward Ubiquitous 3D Object Digitization: A Wearable Computing Framework for Non-Invasive Physical Property Acquisition

    Authors: Yunxiang Zhang, Xin Sun, Dengfeng Li, Xinge Yu, Qi Sun

    Abstract: Accurately digitizing physical objects is central to many applications, including virtual/augmented reality, industrial design, and e-commerce. Prior research has demonstrated efficient and faithful reconstruction of objects' geometric shapes and visual appearances, which suffice for digitally representing rigid objects. In comparison, physical properties, such as elasticity and pressure, are also… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 10 pages, 6 figures

  9. arXiv:2406.16449  [pdf, other

    cs.CV

    Evaluating and Analyzing Relationship Hallucinations in LVLMs

    Authors: Mingrui Wu, Jiayi Ji, Oucheng Huang, Jiale Li, Yuhang Wu, Xiaoshuai Sun, Rongrong Ji

    Abstract: The issue of hallucinations is a prevalent concern in existing Large Vision-Language Models (LVLMs). Previous efforts have primarily focused on investigating object hallucinations, which can be easily alleviated by introducing object detectors. However, these efforts neglect hallucinations in inter-object relationships, which is essential for visual comprehension. In this work, we introduce R-Benc… ▽ More

    Submitted 2 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: ICML2024; Project Page:https://github.com/mrwu-mac/R-Bench

  10. arXiv:2406.16054  [pdf, ps, other

    cs.LO

    On the Relative Completeness of Satisfaction-based Probabilistic Hoare Logic With While Loop

    Authors: Xin Sun, Xingchi Su, Xiaoning Bian, Anran Cui

    Abstract: Probabilistic Hoare logic (PHL) is an extension of Hoare logic and is specifically useful in verifying randomized programs. It allows researchers to formally reason about the behavior of programs with stochastic elements, ensuring the desired probabilistic properties are upheld. The relative completeness of satisfaction-based PHL has been an open problem ever since the birth of the first PHL in 19… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 13 pages. arXiv admin note: text overlap with arXiv:2405.01940

    MSC Class: 03B70 Logic in computer science ACM Class: F.3

  11. arXiv:2406.14965  [pdf, other

    cs.NI

    Energy-Aware Random Access Networks: Connection-Based versus Packet-Based

    Authors: Anshan Yuan, Fangming Zhao, Xinghua Sun

    Abstract: Characterizing and comparing the optimal energy efficiency in energy-aware machine-to-machine (M2M) random access networks remains a challenge due to the distributed nature of the access behavior of nodes. To address this issue, this letter focuses on the energy efficiency limits of two typical random access schemes, i.e., connection-based Aloha and packet-based Aloha, based on which we conducted… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  12. arXiv:2406.14278  [pdf, ps, other

    cs.DS

    Efficient Deterministic Algorithms for Maximizing Symmetric Submodular Functions

    Authors: Zongqi Wan, Jialin Zhang, Xiaoming Sun, Zhijie Zhang

    Abstract: Symmetric submodular maximization is an important class of combinatorial optimization problems, including MAX-CUT on graphs and hyper-graphs. The state-of-the-art algorithm for the problem over general constraints has an approximation ratio of $0.432$. The algorithm applies the canonical continuous greedy technique that involves a sampling process. It, therefore, suffers from high query complexity… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  13. arXiv:2406.13607  [pdf, other

    cs.CV

    Ultra-High-Definition Restoration: New Benchmarks and A Dual Interaction Prior-Driven Solution

    Authors: Liyan Wang, Cong Wang, Jinshan Pan, Weixiang Zhou, Xiaoran Sun, Wei Wang, Zhixun Su

    Abstract: Ultra-High-Definition (UHD) image restoration has acquired remarkable attention due to its practical demand. In this paper, we construct UHD snow and rain benchmarks, named UHD-Snow and UHD-Rain, to remedy the deficiency in this field. The UHD-Snow/UHD-Rain is established by simulating the physics process of rain/snow into consideration and each benchmark contains 3200 degraded/clear image pairs o… ▽ More

    Submitted 22 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  14. arXiv:2406.13457  [pdf, other

    cs.CV cs.AI

    EvTexture: Event-driven Texture Enhancement for Video Super-Resolution

    Authors: Dachun Kai, Jiayao Lu, Yueyi Zhang, Xiaoyan Sun

    Abstract: Event-based vision has drawn increasing attention due to its unique characteristics, such as high temporal resolution and high dynamic range. It has been used in video super-resolution (VSR) recently to enhance the flow estimation and temporal alignment. Rather than for motion learning, we propose in this paper the first VSR method that utilizes event signals for texture enhancement. Our method, c… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: ICML 2024. Project page: https://dachunkai.github.io/evtexture.github.io/

  15. arXiv:2406.12918  [pdf, other

    cs.LG cs.NE

    Brain-Inspired Spike Echo State Network Dynamics for Aero-Engine Intelligent Fault Prediction

    Authors: Mo-Ran Liu, Tao Sun, Xi-Ming Sun

    Abstract: Aero-engine fault prediction aims to accurately predict the development trend of the future state of aero-engines, so as to diagnose faults in advance. Traditional aero-engine parameter prediction methods mainly use the nonlinear mapping relationship of time series data but generally ignore the adequate spatiotemporal features contained in aero-engine data. To this end, we propose a brain-inspired… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  16. arXiv:2406.12304  [pdf, other

    cs.CL

    COT: A Generative Approach for Hate Speech Counter-Narratives via Contrastive Optimal Transport

    Authors: Linhao Zhang, Li Jin, Guangluan Xu, Xiaoyu Li, Xian Sun

    Abstract: Counter-narratives, which are direct responses consisting of non-aggressive fact-based arguments, have emerged as a highly effective approach to combat the proliferation of hate speech. Previous methodologies have primarily focused on fine-tuning and post-editing techniques to ensure the fluency of generated contents, while overlooking the critical aspects of individualization and relevance concer… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: IEEE jounrnals

    MSC Class: 68U15 ACM Class: I.2.7

  17. arXiv:2406.12079  [pdf, other

    cs.CV cs.AI cs.LG

    Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint

    Authors: Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, Jingde Chen, Jose Alvarez

    Abstract: As we push the boundaries of performance in various vision tasks, the models grow in size correspondingly. To keep up with this growth, we need very aggressive pruning techniques for efficient inference and deployment on edge devices. Existing pruning approaches are limited to channel pruning and struggle with aggressive parameter reductions. In this paper, we propose a novel multi-dimensional pru… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under Review

  18. arXiv:2406.11441  [pdf, other

    cs.CV

    SWCF-Net: Similarity-weighted Convolution and Local-global Fusion for Efficient Large-scale Point Cloud Semantic Segmentation

    Authors: Zhenchao Lin, Li He, Hongqiang Yang, Xiaoqun Sun, Cuojin Zhang, Weinan Chen, Yisheng Guan, Hong Zhang

    Abstract: Large-scale point cloud consists of a multitude of individual objects, thereby encompassing rich structural and underlying semantic contextual information, resulting in a challenging problem in efficiently segmenting a point cloud. Most existing researches mainly focus on capturing intricate local features without giving due consideration to global ones, thus failing to leverage semantic context.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  19. arXiv:2406.11432  [pdf, other

    cs.CV cs.AI

    AnyTrans: Translate AnyText in the Image with Large Scale Models

    Authors: Zhipeng Qian, Pei Zhang, Baosong Yang, Kai Fan, Yiwei Ma, Derek F. Wong, Xiaoshuai Sun, Rongrong Ji

    Abstract: This paper introduces AnyTrans, an all-encompassing framework for the task-Translate AnyText in the Image (TATI), which includes multilingual text translation and text fusion within images. Our framework leverages the strengths of large-scale models, such as Large Language Models (LLMs) and text-guided diffusion models, to incorporate contextual cues from both textual and visual elements during tr… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  20. arXiv:2406.10957  [pdf, other

    cs.CL

    Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence

    Authors: Junru Lu, Jiazheng Li, Siyu An, Meng Zhao, Yulan He, Di Yin, Xing Sun

    Abstract: Direct Preference Optimization (DPO) has emerged as a prominent algorithm for the direct and robust alignment of Large Language Models (LLMs) with human preferences, offering a more straightforward alternative to the complex Reinforcement Learning from Human Feedback (RLHF). Despite its promising efficacy, DPO faces a notable drawback: "verbosity", a common over-optimization phenomenon also observ… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  21. arXiv:2406.10917  [pdf, other

    cs.LG stat.ML

    Bayesian Intervention Optimization for Causal Discovery

    Authors: Yuxuan Wang, Mingzhou Liu, Xinwei Sun, Wei Wang, Yizhou Wang

    Abstract: Causal discovery is crucial for understanding complex systems and informing decisions. While observational data can uncover causal relationships under certain assumptions, it often falls short, making active interventions necessary. Current methods, such as Bayesian and graph-theoretical approaches, do not prioritize decision-making and often rely on ideal conditions or information gain, which is… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  22. arXiv:2406.10228  [pdf, other

    cs.CV cs.AI cs.CL

    VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

    Authors: Chenyu Zhou, Mengdan Zhang, Peixian Chen, Chaoyou Fu, Yunhang Shen, Xiawu Zheng, Xing Sun, Rongrong Ji

    Abstract: The swift progress of Multi-modal Large Models (MLLMs) has showcased their impressive ability to tackle tasks blending vision and language. Yet, most current models and benchmarks cater to scenarios with a narrow scope of visual and textual contexts. These models often fall short when faced with complex comprehension tasks, which involve navigating through a plethora of irrelevant and potentially… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project Page: https://zhourax.github.io/VEGA/

  23. arXiv:2406.09988  [pdf, other

    cs.AI cs.CL cs.RO

    Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning

    Authors: Xiaowen Sun, Xufeng Zhao, Jae Hee Lee, Wenhao Lu, Matthias Kerzel, Stefan Wermter

    Abstract: The state of an object reflects its current status or condition and is important for a robot's task planning and manipulation. However, detecting an object's state and generating a state-sensitive plan for robots is challenging. Recently, pre-trained Large Language Models (LLMs) and Vision-Language Models (VLMs) have shown impressive capabilities in generating plans. However, to the best of our kn… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  24. arXiv:2406.09410  [pdf, other

    cs.CV cs.AI

    STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery

    Authors: Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, Bo Dang, Yongjun Zhang, Yi Yu, Junchi Yan

    Abstract: Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it attractive to holistically conduct SGG in large-size very-high-resolution (VHR… ▽ More

    Submitted 3 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 18 pages, 11 figures

  25. arXiv:2406.09371  [pdf, other

    cs.CV cs.LG

    LRM-Zero: Training Large Reconstruction Models with Synthesized Data

    Authors: Desai Xie, Sai Bi, Zhixin Shu, Kai Zhang, Zexiang Xu, Yi Zhou, Sören Pirk, Arie Kaufman, Xin Sun, Hao Tan

    Abstract: We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse, which is automatically synthesized from simple primitive shapes with random texturing and augmentations (e.g., height fields, boolean differences, and wireframes). Unlike previous 3D data… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 23 pages, 8 figures. Our code and interactive visualization are available at: https://desaixie.github.io/lrm-zero/

  26. arXiv:2406.07032  [pdf, other

    cs.CV

    RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks

    Authors: Zhechao Wang, Peirui Cheng, Pengju Tian, Yuchao Wang, Mingxin Chen, Shujing Duan, Zhirui Wang, Xinming Li, Xian Sun

    Abstract: Remote sensing lightweight foundation models have achieved notable success in online perception within remote sensing. However, their capabilities are restricted to performing online inference solely based on their own observations and models, thus lacking a comprehensive understanding of large-scale remote sensing scenarios. To overcome this limitation, we propose a Remote Sensing Distributed Fou… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  27. arXiv:2406.06379  [pdf, other

    cs.CE

    FinVerse: An Autonomous Agent System for Versatile Financial Analysis

    Authors: Siyu An, Qin Li, Junru Lu, Di Yin, Xing Sun

    Abstract: With the significant advancements in cognitive intelligence driven by LLMs, autonomous agent systems have attracted extensive attention. Despite this growing interest, the development of stable and efficient agent systems poses substantial practical challenges. In this paper, we introduce FinVerse, a meticulously crafted agent system designed for a broad range of financial topics. FinVerse integra… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  28. arXiv:2406.06028  [pdf, other

    cs.CV

    ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery

    Authors: Xian Sun, Qiwei Yan, Chubo Deng, Chenglong Liu, Yi Jiang, Zhongyan Hou, Wanxuan Lu, Fanglong Yao, Xiaoyu Liu, Lingxiang Hao, Hongfeng Yu

    Abstract: Scene Graph Generation (SGG) is a high-level visual understanding and reasoning task aimed at extracting entities (such as objects) and their interrelationships from images. Significant progress has been made in the study of SGG in natural images in recent years, but its exploration in the domain of remote sensing images remains very limited. The complex characteristics of remote sensing images ne… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  29. arXiv:2406.05620  [pdf, other

    cs.CV

    Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval

    Authors: Yiwei Ma, Xiaoshuai Sun, Jiayi Ji, Guannan Jiang, Weilin Zhuang, Rongrong Ji

    Abstract: Text-based person retrieval (TPR) is a challenging task that involves retrieving a specific individual based on a textual description. Despite considerable efforts to bridge the gap between vision and language, the significant differences between these modalities continue to pose a challenge. Previous methods have attempted to align text and image samples in a modal-shared space, but they face unc… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: ACM MM2023

  30. arXiv:2406.05346  [pdf, other

    cs.LG

    ProG: A Graph Prompt Learning Benchmark

    Authors: Chenyi Zi, Haihong Zhao, Xiangguo Sun, Yiqing Lin, Hong Cheng, Jia Li

    Abstract: Artificial general intelligence on graphs has shown significant advancements across various applications, yet the traditional 'Pre-train & Fine-tune' paradigm faces inefficiencies and negative transfer issues, particularly in complex and few-shot settings. Graph prompt learning emerges as a promising alternative, leveraging lightweight prompts to manipulate data and fill the task gap by reformulat… ▽ More

    Submitted 19 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  31. arXiv:2406.04942  [pdf, other

    cs.CV

    Joint Spatial-Temporal Modeling and Contrastive Learning for Self-supervised Heart Rate Measurement

    Authors: Wei Qian, Qi Li, Kun Li, Xinke Wang, Xiao Sun, Meng Wang, Dan Guo

    Abstract: This paper briefly introduces the solutions developed by our team, HFUT-VUT, for Track 1 of self-supervised heart rate measurement in the 3rd Vision-based Remote Physiological Signal Sensing (RePSS) Challenge hosted at IJCAI 2024. The goal is to develop a self-supervised learning algorithm for heart rate (HR) estimation using unlabeled facial videos. To tackle this task, we present two self-superv… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  32. arXiv:2406.04648  [pdf, other

    cs.CV

    UCDNet: Multi-UAV Collaborative 3D Object Detection Network by Reliable Feature Mapping

    Authors: Pengju Tian, Peirui Cheng, Yuchao Wang, Zhechao Wang, Zhirui Wang, Menglong Yan, Xue Yang, Xian Sun

    Abstract: Multi-UAV collaborative 3D object detection can perceive and comprehend complex environments by integrating complementary information, with applications encompassing traffic monitoring, delivery services and agricultural management. However, the extremely broad observations in aerial remote sensing and significant perspective differences across multiple UAVs make it challenging to achieve precise… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  33. arXiv:2406.04647  [pdf, other

    cs.CV

    UVCPNet: A UAV-Vehicle Collaborative Perception Network for 3D Object Detection

    Authors: Yuchao Wang, Peirui Cheng, Pengju Tian, Ziyang Yuan, Liangjin Zhao, Jing Tian, Wensheng Wang, Zhirui Wang, Xian Sun

    Abstract: With the advancement of collaborative perception, the role of aerial-ground collaborative perception, a crucial component, is becoming increasingly important. The demand for collaborative perception across different perspectives to construct more comprehensive perceptual information is growing. However, challenges arise due to the disparities in the field of view between cross-domain agents and th… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  34. arXiv:2406.03075  [pdf, other

    cs.CL

    Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

    Authors: Xiaoxi Sun, Jinpeng Li, Yan Zhong, Dongyan Zhao, Rui Yan

    Abstract: The advent of large language models (LLMs) has facilitated the development of natural language text generation. It also poses unprecedented challenges, with content hallucination emerging as a significant concern. Existing solutions often involve expensive and complex interventions during the training process. Moreover, some approaches emphasize problem disassembly while neglecting the crucial val… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 18 pages, 3 figures

  35. arXiv:2406.02425  [pdf, other

    cs.CV cs.RO

    CoNav: A Benchmark for Human-Centered Collaborative Navigation

    Authors: Changhao Li, Xinyu Sun, Peihao Chen, Jugang Fan, Zixu Wang, Yanxia Liu, Jinhui Zhu, Chuang Gan, Mingkui Tan

    Abstract: Human-robot collaboration, in which the robot intelligently assists the human with the upcoming task, is an appealing objective. To achieve this goal, the agent needs to be equipped with a fundamental collaborative navigation ability, where the agent should reason human intention by observing human activities and then navigate to the human's intended destination in advance of the human. However, t… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  36. arXiv:2406.01451  [pdf, other

    cs.CV cs.MM

    SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

    Authors: Danni Yang, Jiayi Ji, Yiwei Ma, Tianyu Guo, Haowei Wang, Xiaoshuai Sun, Rongrong Ji

    Abstract: In this paper, we introduce SemiRES, a semi-supervised framework that effectively leverages a combination of labeled and unlabeled data to perform RES. A significant hurdle in applying semi-supervised techniques to RES is the prevalence of noisy pseudo-labels, particularly at the boundaries of objects. SemiRES incorporates the Segment Anything Model (SAM), renowned for its precise boundary demarca… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML2024

  37. arXiv:2406.00334  [pdf, other

    cs.CV

    Image Captioning via Dynamic Path Customization

    Authors: Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Xiaopeng Hong, Yongjian Wu, Rongrong Ji

    Abstract: This paper explores a novel dynamic network for vision and language tasks, where the inferring structure is customized on the fly for different inputs. Most previous state-of-the-art approaches are static and hand-crafted networks, which not only heavily rely on expert knowledge, but also ignore the semantic diversity of input samples, therefore resulting in suboptimal performance. To address thes… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: TNNLS24

  38. arXiv:2406.00276  [pdf

    cs.LG cs.AI cs.CE physics.data-an

    Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning

    Authors: Shengyu Tao, Mengtian Zhang, Zixi Zhao, Haoyang Li, Ruifei Ma, Yunhong Che, Xin Sun, Lin Su, Xiangyu Chen, Zihao Zhou, Heng Chang, Tingwei Cao, Xiao Xiao, Yaojun Liu, Wenjun Yu, Zhongling Xu, Yang Li, Han Hao, Xuan Zhang, Xiaosong Hu, Guangmin ZHou

    Abstract: Manufacturing complexities and uncertainties have impeded the transition from material prototypes to commercial batteries, making prototype verification critical to quality assessment. A fundamental challenge involves deciphering intertwined chemical processes to characterize degradation patterns and their quantitative relationship with battery performance. Here we show that a physics-informed mac… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    ACM Class: J.2; G.3

  39. arXiv:2406.00143  [pdf, other

    cs.CV

    Diversifying Query: Region-Guided Transformer for Temporal Sentence Grounding

    Authors: Xiaolong Sun, Liushuai Shi, Le Wang, Sanping Zhou, Kun Xia, Yabing Wang, Gang Hua

    Abstract: Temporal sentence grounding is a challenging task that aims to localize the moment spans relevant to a language description. Although recent DETR-based models have achieved notable progress by leveraging multiple learnable moment queries, they suffer from overlapped and redundant proposals, leading to inaccurate predictions. We attribute this limitation to the lack of task-related guidance for the… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  40. arXiv:2405.21075  [pdf, other

    cs.CV cs.CL

    Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

    Authors: Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong Ji, Xing Sun

    Abstract: In the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs) have emerged as a focal point in recent advancements. However, the predominant focus remains on developing their capabilities in static image understanding. The potential of MLLMs in processing sequential visual data is still insufficiently explored, highlighting the absence of a comprehensive, high-quality… ▽ More

    Submitted 16 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Project Page: https://video-mme.github.io

  41. arXiv:2405.20985  [pdf, other

    cs.CV

    DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models

    Authors: Linli Yao, Lei Li, Shuhuai Ren, Lean Wang, Yuanxin Liu, Xu Sun, Lu Hou

    Abstract: The visual projector, which bridges the vision and language modalities and facilitates cross-modal alignment, serves as a crucial component in MLLMs. However, measuring the effectiveness of projectors in vision-language alignment remains under-explored, which currently can only be inferred from the performance of MLLMs on downstream tasks. Motivated by the problem, this study examines the projecto… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  42. arXiv:2405.20339  [pdf, other

    cs.CV

    Visual Perception by Large Language Model's Weights

    Authors: Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun

    Abstract: Existing Multimodal Large Language Models (MLLMs) follow the paradigm that perceives visual information by aligning visual features with the input space of Large Language Models (LLMs), and concatenating visual tokens with text tokens to form a unified sequence input for LLMs. These methods demonstrate promising results on various vision-language tasks but are limited by the high computational eff… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  43. arXiv:2405.19735  [pdf, other

    cs.CV

    Twin Deformable Point Convolutions for Point Cloud Semantic Segmentation in Remote Sensing Scenes

    Authors: Yong-Qiang Mao, Hanbo Bi, Xuexue Li, Kaiqiang Chen, Zhirui Wang, Xian Sun, Kun Fu

    Abstract: Thanks to the application of deep learning technology in point cloud processing of the remote sensing field, point cloud segmentation has become a research hotspot in recent years, which can be applied to real-world 3D, smart cities, and other fields. Although existing solutions have made unprecedented progress, they ignore the inherent characteristics of point clouds in remote sensing fields that… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  44. arXiv:2405.19333  [pdf, other

    cs.CV

    Multi-Modal Generative Embedding Model

    Authors: Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun

    Abstract: Most multi-modal tasks can be formulated into problems of either generation or embedding. Existing models usually tackle these two types of problems by decoupling language modules into a text decoder for generation, and a text encoder for embedding. To explore the minimalism of multi-modal paradigms, we attempt to achieve only one model per modality in this work. We propose a Multi-Modal Generativ… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  45. arXiv:2405.19299  [pdf, other

    cs.CL

    Expert-Guided Extinction of Toxic Tokens for Debiased Generation

    Authors: Xueyao Sun, Kaize Shi, Haoran Tang, Guandong Xu, Qing Li

    Abstract: Large language models (LLMs) can elicit social bias during generations, especially when inference with toxic prompts. Controlling the sensitive attributes in generation encounters challenges in data distribution, generalizability, and efficiency. Specifically, fine-tuning and retrieval demand extensive unbiased corpus, while direct prompting requires meticulously curated instructions for correctin… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  46. arXiv:2405.18563  [pdf, other

    cs.LG stat.ME

    Counterfactual Explanations for Multivariate Time-Series without Training Datasets

    Authors: Xiangyu Sun, Raquel Aoki, Kevin H. Wilson

    Abstract: Machine learning (ML) methods have experienced significant growth in the past decade, yet their practical application in high-impact real-world domains has been hindered by their opacity. When ML methods are responsible for making critical decisions, stakeholders often require insights into how to alter these decisions. Counterfactual explanations (CFEs) have emerged as a solution, offering interp… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  47. arXiv:2405.17140  [pdf, other

    cs.CV

    SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing

    Authors: Yong-Qiang Mao, Hanbo Bi, Liangyu Xu, Kaiqiang Chen, Zhirui Wang, Xian Sun, Kun Fu

    Abstract: Research on multi-view stereo based on remote sensing images has promoted the development of large-scale urban 3D reconstruction. However, remote sensing multi-view image data suffers from the problems of occlusion and uneven brightness between views during acquisition, which leads to the problem of blurred details in depth estimation. To solve the above problem, we re-examine the deformable learn… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  48. arXiv:2405.17083  [pdf, other

    cs.CV

    F-3DGS: Factorized Coordinates and Representations for 3D Gaussian Splatting

    Authors: Xiangyu Sun, Joo Chan Lee, Daniel Rho, Jong Hwan Ko, Usman Ali, Eunbyung Park

    Abstract: The neural radiance field (NeRF) has made significant strides in representing 3D scenes and synthesizing novel views. Despite its advancements, the high computational costs of NeRF have posed challenges for its deployment in resource-constrained environments and real-time applications. As an alternative to NeRF-like neural rendering methods, 3D Gaussian Splatting (3DGS) offers rapid rendering spee… ▽ More

    Submitted 28 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Our project page including code is available at https://xiangyu1sun.github.io/Factorize-3DGS/

  49. arXiv:2405.17003  [pdf, other

    cs.LG

    Graph Condensation for Open-World Graph Learning

    Authors: Xinyi Gao, Tong Chen, Wentao Zhang, Yayong Li, Xiangguo Sun, Hongzhi Yin

    Abstract: The burgeoning volume of graph data presents significant computational challenges in training graph neural networks (GNNs), critically impeding their efficiency in various applications. To tackle this challenge, graph condensation (GC) has emerged as a promising acceleration solution, focusing on the synthesis of a compact yet representative graph for efficiently training GNNs while retaining perf… ▽ More

    Submitted 12 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  50. arXiv:2405.15758  [pdf, other

    cs.CV cs.AI

    InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

    Authors: Yuchi Wang, Junliang Guo, Jianhong Bai, Runyi Yu, Tianyu He, Xu Tan, Xu Sun, Jiang Bian

    Abstract: Recent talking avatar generation models have made strides in achieving realistic and accurate lip synchronization with the audio, but often fall short in controlling and conveying detailed expressions and emotions of the avatar, making the generated video less vivid and controllable. In this paper, we propose a novel text-guided approach for generating emotionally expressive 2D avatars, offering f… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Project page: https://wangyuchi369.github.io/InstructAvatar/