Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 542 results for author: Gao, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12981  [pdf, other

    cs.AI

    QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment Retrieval

    Authors: Chenghua Gao, Min Li, Jianshuo Liu, Junxing Ren, Lin Chen, Haoyu Liu, Bo Meng, Jitao Fu, Wenwen Su

    Abstract: Video Moment Retrieval (VMR) aims to retrieve relevant moments of an untrimmed video corresponding to the query. While cross-modal interaction approaches have shown progress in filtering out query-irrelevant information in videos, they assume the precise alignment between the query semantics and the corresponding video moments, potentially overlooking the misunderstanding of the natural language s… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 9 pages, 4 figures, 4 tables

  2. arXiv:2408.12687  [pdf, other

    cs.HC

    Bridging the gap between natural user expression with complex automation programming in smart homes

    Authors: Yingtian Shi, Xiaoyi Liu, Chun Yu, Tianao Yang, Cheng Gao, Chen Liang, Yuanchun Shi

    Abstract: A long-standing challenge in end-user programming (EUP) is to trade off between natural user expression and the complexity of programming tasks. As large language models (LLMs) are empowered to handle semantic inference and natural language understanding, it remains under-explored how such capabilities can facilitate end-users to configure complex automation more naturally and easily. We propose A… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  3. arXiv:2408.12470  [pdf, other

    cs.IR

    DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems

    Authors: Jiaju Chen, Chongming Gao, Shuai Yuan, Shuchang Liu, Qingpeng Cai, Peng Jiang

    Abstract: The integration of Large Language Models (LLMs) into recommender systems has led to substantial performance improvements. However, this often comes at the cost of diminished recommendation diversity, which can negatively impact user satisfaction. To address this issue, controllable recommendation has emerged as a promising approach, allowing users to specify their preferences and receive recommend… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  4. arXiv:2408.12159  [pdf, other

    cs.SE cs.AI cs.CL

    Search-Based LLMs for Code Optimization

    Authors: Shuzheng Gao, Cuiyun Gao, Wenchao Gu, Michael Lyu

    Abstract: The code written by developers usually suffers from efficiency problems and contain various performance bugs. These inefficiencies necessitate the research of automated refactoring methods for code optimization. Early research in code optimization employs rule-based methods and focuses on specific inefficiency issues, which are labor-intensive and suffer from the low coverage issue. Recent work re… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted by 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE'25)

  5. arXiv:2408.08661  [pdf, other

    cs.CL cs.CR cs.LG

    MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector

    Authors: Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, Tao Jiang

    Abstract: The increasing parameters and expansive dataset of large language models (LLMs) highlight the urgent demand for a technical solution to audit the underlying privacy risks and copyright issues associated with LLMs. Existing studies have partially addressed this need through an exploration of the pre-training data detection problem, which is an instance of a membership inference attack (MIA). This p… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: code and dataset: https://github.com/wjfu99/MIA-Tuner

  6. arXiv:2408.06809  [pdf, other

    cs.IR

    Reformulating Conversational Recommender Systems as Tri-Phase Offline Policy Learning

    Authors: Gangyi Zhang, Chongming Gao, Hang Pan, Runzhe Teng, Ruizhe Li

    Abstract: Existing Conversational Recommender Systems (CRS) predominantly utilize user simulators for training and evaluating recommendation policies. These simulators often oversimplify the complexity of user interactions by focusing solely on static item attributes, neglecting the rich, evolving preferences that characterize real-world user behavior. This limitation frequently leads to models that perform… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted at CIKM 2024

  7. arXiv:2408.06123  [pdf, other

    cs.CV cs.MM

    DPDETR: Decoupled Position Detection Transformer for Infrared-Visible Object Detection

    Authors: Junjie Guo, Chenqiang Gao, Fangcen Liu, Deyu Meng

    Abstract: Infrared-visible object detection aims to achieve robust object detection by leveraging the complementary information of infrared and visible image pairs. However, the commonly existing modality misalignment problem presents two challenges: fusing misalignment complementary features is difficult, and current methods cannot accurately locate objects in both modalities under misalignment conditions.… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  8. arXiv:2408.05002  [pdf, other

    cs.SE

    An Empirical Study on Challenges for LLM Developers

    Authors: Xiang Chen, Chaoyang Gao, Chunyang Chen, Guangbei Zhang, Yong Liu

    Abstract: In recent years, large language models (LLMs) have seen rapid advancements, significantly impacting various fields such as natural language processing, and software engineering. These LLMs, exemplified by OpenAI's ChatGPT, have revolutionized the way we approach language understanding and generation tasks. However, in contrast to traditional software development practices, LLM development introduc… ▽ More

    Submitted 11 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: 29 pages, 15 figures

  9. Towards High-resolution 3D Anomaly Detection via Group-Level Feature Contrastive Learning

    Authors: Hongze Zhu, Guoyang Xie, Chengbin Hou, Tao Dai, Can Gao, Jinbao Wang, Linlin Shen

    Abstract: High-resolution point clouds~(HRPCD) anomaly detection~(AD) plays a critical role in precision machining and high-end equipment manufacturing. Despite considerable 3D-AD methods that have been proposed recently, they still cannot meet the requirements of the HRPCD-AD task. There are several challenges: i) It is difficult to directly capture HRPCD information due to large amounts of points at the s… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: ACMMM24, 12 pages, 5 figures

  10. arXiv:2408.03680  [pdf, other

    cs.SE

    Iterative Knowledge Distillation through Feedback-Driven Learning Cycles

    Authors: Yujia Chen, Yang Ye, Zhongqi Li, Yuchi Ma, Cuiyun Gao

    Abstract: Large code models (LCMs) have remarkably advanced the field of code intelligence. Despite their impressive capabilities, they still face practical employment challenges, such as high costs, limited accessibility of proprietary LCMs, and adaptability issues of ultra-large LCMs. These challenges highlight the critical need for more accessible, lightweight yet effective LCMs. In this paper, we propos… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  11. arXiv:2408.03519  [pdf, other

    cs.SE cs.AI

    RepoMasterEval: Evaluating Code Completion via Real-World Repositories

    Authors: Qinyun Wu, Chao Peng, Pengfei Gao, Ruida Hu, Haoyu Gan, Bo Jiang, Jinhe Tang, Zhiwen Deng, Zhanming Guan, Cuiyun Gao, Xia Liu, Ping Yang

    Abstract: With the growing reliance on automated code completion tools in software development, the need for robust evaluation benchmarks has become critical. However, existing benchmarks focus more on code generation tasks in function and class level and provide rich text description to prompt the model. By contrast, such descriptive prompt is commonly unavailable in real development and code completion ca… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  12. arXiv:2407.17115  [pdf, other

    cs.IR

    Reinforced Prompt Personalization for Recommendation with Large Language Models

    Authors: Wenyu Mao, Jiancan Wu, Weijian Chen, Chongming Gao, Xiang Wang, Xiangnan He

    Abstract: Designing effective prompts can empower LLMs to understand user preferences and provide recommendations by leveraging LLMs' intent comprehension and knowledge utilization capabilities. However, existing research predominantly concentrates on task-wise prompting, developing fixed prompt templates composed of four patterns (i.e., role-playing, history records, reasoning guidance, and output format)… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  13. arXiv:2407.16729  [pdf, other

    cs.LG cs.AI

    PateGail: A Privacy-Preserving Mobility Trajectory Generator with Imitation Learning

    Authors: Huandong Wang, Changzheng Gao, Yuchen Wu, Depeng Jin, Lina Yao, Yong Li

    Abstract: Generating human mobility trajectories is of great importance to solve the lack of large-scale trajectory data in numerous applications, which is caused by privacy concerns. However, existing mobility trajectory generation methods still require real-world human trajectories centrally collected as the training data, where there exists an inescapable risk of privacy leakage. To overcome this limitat… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  14. arXiv:2407.11906  [pdf, other

    cs.CV cs.RO

    SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge

    Authors: Hao Ding, Tuxun Lu, Yuqian Zhang, Ruixing Liang, Hongchao Shu, Lalithkumar Seenivasan, Yonghao Long, Qi Dou, Cong Gao, Mathias Unberath

    Abstract: Accurate segmentation of tools in robot-assisted surgery is critical for machine perception, as it facilitates numerous downstream tasks including augmented reality feedback. While current feed-forward neural network-based methods exhibit excellent segmentation performance under ideal conditions, these models have proven susceptible to even minor corruptions, significantly impairing the model's pe… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  15. arXiv:2407.10223  [pdf, other

    cs.LG cs.CR

    Practical Unlearning for Large Language Models

    Authors: Chongyang Gao, Lixu Wang, Chenkai Weng, Xiao Wang, Qi Zhu

    Abstract: While LLMs have demonstrated impressive performance across various domains and tasks, their security issues have become increasingly severe. Machine unlearning (MU) has emerged as a promising solution to address these issues by removing the influence of undesired data on the target model without compromising its utility in other aspects. MU typically assumes full access to the original training da… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 17 pages, 8 figures. The first two authors contribute equally and they are ordered alphabetically

  16. arXiv:2407.09793  [pdf, other

    cs.SE

    Uncovering Weaknesses in Neural Code Generation

    Authors: Xiaoli Lian, Shuaisong Wang, Jieping Ma, Fang Liu, Xin Tan, Li Zhang, Lin Shi, Cuiyun Gao

    Abstract: Code generation, the task of producing source code from prompts, has seen significant advancements with the advent of pre-trained large language models (PLMs). Despite these achievements, there lacks a comprehensive taxonomy of weaknesses about the benchmark and the generated code, which risks the community's focus on known issues at the cost of under-explored areas. Our systematic study aims to… ▽ More

    Submitted 17 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

  17. arXiv:2407.09693  [pdf, other

    cs.LG cs.AI

    A Mathematical Framework, a Taxonomy of Modeling Paradigms, and a Suite of Learning Techniques for Neural-Symbolic Systems

    Authors: Charles Dickens, Connor Pryor, Changyu Gao, Alon Albalak, Eriq Augustine, William Wang, Stephen Wright, Lise Getoor

    Abstract: The field of Neural-Symbolic (NeSy) systems is growing rapidly. Proposed approaches show great promise in achieving symbiotic unions of neural and symbolic methods. However, each NeSy system differs in fundamental ways. There is a pressing need for a unifying theory to illuminate the commonalities and differences in approaches and enable further progress. In this paper, we introduce Neural-Symboli… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  18. arXiv:2407.09690  [pdf, other

    cs.LG cs.CR math.OC

    Private Heterogeneous Federated Learning Without a Trusted Server Revisited: Error-Optimal and Communication-Efficient Algorithms for Convex Losses

    Authors: Changyu Gao, Andrew Lowy, Xingyu Zhou, Stephen J. Wright

    Abstract: We revisit the problem of federated learning (FL) with private data from people who do not trust the server or other silos/clients. In this context, every silo (e.g. hospital) has data from several people (e.g. patients) and needs to protect the privacy of each person's data (e.g. health records), even if the server and/or other silos try to uncover this data. Inter-Silo Record-Level Differential… ▽ More

    Submitted 17 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: The 41st International Conference on Machine Learning (ICML 2024)

  19. arXiv:2407.08931  [pdf, other

    cs.CV

    Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection

    Authors: Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu

    Abstract: Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes. Extensive work has been done to deal with the OVD for 2D RGB images, but the exploration of 3D OVD is still limited. Intuitively, lidar point clouds provide 3D information, both object level and scene level, to generate trustful detection results. However, previous l… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: accepted by ECCV 2024

  20. arXiv:2407.08681  [pdf, other

    cs.RO cs.LG eess.SY

    Hardware Neural Control of CartPole and F1TENTH Race Car

    Authors: Marcin Paluch, Florian Bolli, Xiang Deng, Antonio Rios Navarro, Chang Gao, Tobi Delbruck

    Abstract: Nonlinear model predictive control (NMPC) has proven to be an effective control method, but it is expensive to compute. This work demonstrates the use of hardware FPGA neural network controllers trained to imitate NMPC with supervised learning. We use these Neural Controllers (NCs) implemented on inexpensive embedded FPGA hardware for high frequency control on physical cartpole and F1TENTH race ca… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  21. arXiv:2407.04451  [pdf, other

    cs.LG cs.AI

    Hindsight Preference Learning for Offline Preference-based Reinforcement Learning

    Authors: Chen-Xiao Gao, Shengjun Fang, Chenjun Xiao, Yang Yu, Zongzhang Zhang

    Abstract: Offline preference-based reinforcement learning (RL), which focuses on optimizing policies using human preferences between pairs of trajectory segments selected from an offline dataset, has emerged as a practical avenue for RL applications. Existing works rely on extracting step-wise reward signals from trajectory-wise preference annotations, assuming that preferences correlate with the cumulative… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  22. arXiv:2407.04051  [pdf, other

    cs.SD cs.AI eess.AS

    FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

    Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress. Authors are listed in alphabetical order by family name

  23. arXiv:2407.03361  [pdf, ps, other

    cs.SD cs.AI eess.AS

    PianoBART: Symbolic Piano Music Generation and Understanding with Large-Scale Pre-Training

    Authors: Xiao Liang, Zijian Zhao, Weichao Zeng, Yutong He, Fupeng He, Yiyi Wang, Chengying Gao

    Abstract: Learning musical structures and composition patterns is necessary for both music generation and understanding, but current methods do not make uniform use of learned features to generate and comprehend music simultaneously. In this paper, we propose PianoBART, a pre-trained model that uses BART for both symbolic piano music generation and understanding. We devise a multi-level object selection str… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  24. arXiv:2407.01885  [pdf, other

    cs.CL cs.AI

    Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application

    Authors: Chuanpeng Yang, Wang Lu, Yao Zhu, Yidong Wang, Qian Chen, Chenlong Gao, Bingjie Yan, Yiqiang Chen

    Abstract: Large Language Models (LLMs) have showcased exceptional capabilities in various domains, attracting significant interest from both academia and industry. Despite their impressive performance, the substantial size and computational demands of LLMs pose considerable challenges for practical deployment, particularly in environments with limited resources. The endeavor to compress language models whil… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 28 pages

  25. arXiv:2407.01183  [pdf, other

    cs.DB

    TCSR-SQL: Towards Table Content-aware Text-to-SQL with Self-retrieval

    Authors: Wenbo Xu, Liang Yan, Peiyi Han, Haifeng Zhu, Chuanyi Liu, Shaoming Duan, Cuiyun Gao, Yingwei Liang

    Abstract: Large Language Model-based (LLM-based) Text-to-SQL methods have achieved important progress in generating SQL queries for real-world applications. When confronted with table content-aware questions in real-world scenarios, ambiguous data content keywords and non-existent database schema column names within the question leads to the poor performance of existing methods. To solve this problem, we pr… ▽ More

    Submitted 12 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  26. arXiv:2406.19672  [pdf, other

    cs.CV

    Beyond First-Order: A Multi-Scale Approach to Finger Knuckle Print Biometrics

    Authors: Chengrui Gao, Ziyuan Yang, Andrew Beng Jin Teoh, Min Zhu

    Abstract: Recently, finger knuckle prints (FKPs) have gained attention due to their rich textural patterns, positioning them as a promising biometric for identity recognition. Prior FKP recognition methods predominantly leverage first-order feature descriptors, which capture intricate texture details but fail to account for structural information. Emerging research, however, indicates that second-order text… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  27. arXiv:2406.18966  [pdf, other

    cs.CL

    UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models

    Authors: Siyuan Wu, Yue Huang, Chujie Gao, Dongping Chen, Qihui Zhang, Yao Wan, Tianyi Zhou, Xiangliang Zhang, Jianfeng Gao, Chaowei Xiao, Lichao Sun

    Abstract: Large Language Models (LLMs) such as GPT-4 and Llama3 have significantly impacted various fields by enabling high-quality synthetic data generation and reducing dependence on expensive human-generated datasets. Despite this, challenges remain in the areas of generalization, controllability, diversity, and truthfulness within the existing generative frameworks. To address these challenges, this pap… ▽ More

    Submitted 22 August, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  28. arXiv:2406.16655  [pdf, other

    cs.CL

    Large Language Models Are Cross-Lingual Knowledge-Free Reasoners

    Authors: Peng Hu, Sizhe Liu, Changjiang Gao, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang

    Abstract: Large Language Models have demonstrated impressive reasoning capabilities across multiple languages. However, the relationship between capabilities in different languages is less explored. In this work, we decompose the process of reasoning tasks into two separated parts: knowledge retrieval and knowledge-free reasoning, and analyze the cross-lingual transferability of them. With adapted and const… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  29. arXiv:2406.16370  [pdf, other

    cs.RO

    An Active Search Strategy with Multiple Unmanned Aerial Systems for Multiple Targets

    Authors: Chuanxiang Gao, Xinyi Wang, Xi Chen, Ben M. Chen

    Abstract: The challenge of efficient target searching in vast natural environments has driven the need for advanced multi-UAV active search strategies. This paper introduces a novel method in which global and local information is adeptly merged to avoid issues such as myopia and redundant back-and-forth movements. In addition, a trajectory generation method is used to ensure the search pattern within contin… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  30. arXiv:2406.16121  [pdf, other

    cs.LG cs.AI

    Diffusion Spectral Representation for Reinforcement Learning

    Authors: Dmitry Shribak, Chen-Xiao Gao, Yitong Li, Chenjun Xiao, Bo Dai

    Abstract: Diffusion-based models have achieved notable empirical successes in reinforcement learning (RL) due to their expressiveness in modeling complex distributions. Despite existing methods being promising, the key challenge of extending existing methods for broader real-world applications lies in the computational cost at inference time, i.e., sampling from a diffusion model is considerably slow as it… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Under review

  31. arXiv:2406.13706  [pdf, other

    cs.CL cs.AI cs.CY

    Breaking News: Case Studies of Generative AI's Use in Journalism

    Authors: Natalie Grace Brigham, Chongjiu Gao, Tadayoshi Kohno, Franziska Roesner, Niloofar Mireshghallah

    Abstract: Journalists are among the many users of large language models (LLMs). To better understand the journalist-AI interactions, we conduct a study of LLM usage by two news agencies through browsing the WildChat dataset, identifying candidate interactions, and verifying them by matching to online published articles. Our analysis uncovers instances where journalists provide sensitive material such as con… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  32. arXiv:2406.13443  [pdf, other

    cs.CL

    Dual-Phase Accelerated Prompt Optimization

    Authors: Muchen Yang, Moxin Li, Yongle Li, Zijun Chen, Chongming Gao, Junqi Zhang, Yangyang Li, Fuli Feng

    Abstract: Gradient-free prompt optimization methods have made significant strides in enhancing the performance of closed-source Large Language Models (LLMs) across a wide range of tasks. However, existing approaches make light of the importance of high-quality prompt initialization and the identification of effective optimization directions, thus resulting in substantial optimization steps to obtain satisfa… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  33. arXiv:2406.12235  [pdf, other

    cs.CV

    Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM

    Authors: Huaxin Zhang, Xiaohao Xu, Xiang Wang, Jialong Zuo, Chuchu Han, Xiaonan Huang, Changxin Gao, Yuehuan Wang, Nong Sang

    Abstract: Towards open-ended Video Anomaly Detection (VAD), existing methods often exhibit biased detection when faced with challenging or unseen events and lack interpretability. To address these drawbacks, we propose Holmes-VAD, a novel framework that leverages precise temporal supervision and rich multimodal instructions to enable accurate anomaly localization and comprehensive explanations. Firstly, tow… ▽ More

    Submitted 29 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 19 pages, 9 figures

  34. arXiv:2406.10819  [pdf, other

    cs.CV cs.AI cs.CL

    GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

    Authors: Dongping Chen, Yue Huang, Siyuan Wu, Jingyu Tang, Liuyi Chen, Yilin Bai, Zhigang He, Chenlong Wang, Huichi Zhou, Yiqiang Li, Tianshuo Zhou, Yue Yu, Chujie Gao, Qihui Zhang, Yi Gui, Zhen Li, Yao Wan, Pan Zhou, Jianfeng Gao, Lichao Sun

    Abstract: Recently, Multimodal Large Language Models (MLLMs) have been used as agents to control keyboard and mouse inputs by directly perceiving the Graphical User Interface (GUI) and generating corresponding code. However, current agents primarily exhibit excellent understanding capabilities in static environments and are predominantly applied in relatively simple domains, such as Web or mobile interfaces… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  35. arXiv:2406.10292  [pdf, other

    cs.AI cs.CL cs.LG

    Automatically Labeling $200B Life-Saving Datasets: A Large Clinical Trial Outcome Benchmark

    Authors: Chufan Gao, Jathurshan Pradeepkumar, Trisha Das, Shivashankar Thati, Jimeng Sun

    Abstract: The global cost of drug discovery and development exceeds $200 billion annually. The main results of drug discovery and development are the outcomes of clinical trials, which directly influence the regulatory approval of new drug candidates and ultimately affect patient outcomes. Despite their significance, large-scale, high-quality clinical trial outcome data are not readily available to the publ… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  36. arXiv:2406.09829  [pdf, other

    cs.CV

    Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

    Authors: Xiangheng Shan, Dongyue Wu, Guilin Zhu, Yuanjie Shao, Nong Sang, Changxin Gao

    Abstract: Open-vocabulary semantic segmentation is a challenging task, which requires the model to output semantic masks of an image beyond a close-set vocabulary. Although many efforts have been made to utilize powerful CLIP models to accomplish this task, they are still easily overfitting to training classes due to the natural gaps in semantic information between training and new classes. To overcome this… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: CVPR2024

  37. arXiv:2406.09395  [pdf, other

    cs.CV

    Modeling Ambient Scene Dynamics for Free-view Synthesis

    Authors: Meng-Li Shih, Jia-Bin Huang, Changil Kim, Rajvi Shah, Johannes Kopf, Chen Gao

    Abstract: We introduce a novel method for dynamic free-view synthesis of an ambient scenes from a monocular capture bringing a immersive quality to the viewing experience. Our method builds upon the recent advancements in 3D Gaussian Splatting (3DGS) that can faithfully reconstruct complex static scenes. Previous attempts to extend 3DGS to represent dynamics have been confined to bounded scenes or require m… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  38. arXiv:2406.09333  [pdf, other

    cs.CV

    Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image Analysis

    Authors: Weiyi Wu, Chongyang Gao, Xinwen Xu, Siting Li, Jiang Gui

    Abstract: Whole Slide Images (WSIs) are crucial for modern pathological diagnosis, yet their gigapixel-scale resolutions and sparse informative regions pose significant computational challenges. Traditional dense attention mechanisms, widely used in computer vision and natural language processing, are impractical for WSI analysis due to the substantial data scale and the redundant processing of uninformativ… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  39. Less Cybersickness, Please: Demystifying and Detecting Stereoscopic Visual Inconsistencies in VR Apps

    Authors: Shuqing Li, Cuiyun Gao, Jianping Zhang, Yujia Zhang, Yepang Liu, Jiazhen Gu, Yun Peng, Michael R. Lyu

    Abstract: The quality of Virtual Reality (VR) apps is vital, particularly the rendering quality of the VR Graphical User Interface (GUI). Different from traditional 2D apps, VR apps create a 3D digital scene for users, by rendering two distinct 2D images for the user's left and right eyes, respectively. Stereoscopic visual inconsistency (denoted as "SVI") issues, however, undermine the rendering process of… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This work has been accepted at the ACM International Conference on the Foundations of Software Engineering (FSE) 2024, Porto de Galinhas, Brazil. DOI: https://doi.org/10.1145/3660803

  40. arXiv:2406.07393  [pdf, other

    cs.CL

    Limited Out-of-Context Knowledge Reasoning in Large Language Models

    Authors: Peng Hu, Changjiang Gao, Ruiqi Gao, Jiajun Chen, Shujian Huang

    Abstract: Large Language Models (LLMs) have demonstrated strong capabilities as knowledge bases and significant in-context reasoning capabilities. However, previous work challenges their out-of-context reasoning ability, i.e., the ability to infer information from their training data, instead of from the context or prompt. This paper focuses on a significant facet of out-of-context reasoning: Out-of-Context… ▽ More

    Submitted 24 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  41. arXiv:2406.05906  [pdf, other

    cs.CL cs.AI

    TTM-RE: Memory-Augmented Document-Level Relation Extraction

    Authors: Chufan Gao, Xuan Wang, Jimeng Sun

    Abstract: Document-level relation extraction aims to categorize the association between any two entities within a document. We find that previous methods for document-level relation extraction are ineffective in exploiting the full potential of large amounts of training data with varied noise levels. For example, in the ReDocRED benchmark dataset, state-of-the-art methods trained on the large-scale, lower-q… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted in ACL 2024 Main

  42. arXiv:2406.01188  [pdf, other

    cs.CV

    UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation

    Authors: Xiang Wang, Shiwei Zhang, Changxin Gao, Jiayu Wang, Xiaoqiang Zhou, Yingya Zhang, Luxin Yan, Nong Sang

    Abstract: Recent diffusion-based human image animation techniques have demonstrated impressive success in synthesizing videos that faithfully follow a given reference identity and a sequence of desired movement poses. Despite this, there are still two limitations: i) an extra reference model is required to align the identity image with the main video branch, which significantly increases the optimization bu… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Project page: https://unianimate.github.io/

  43. arXiv:2406.00380  [pdf, other

    cs.CL cs.AI

    The Best of Both Worlds: Toward an Honest and Helpful Large Language Model

    Authors: Chujie Gao, Qihui Zhang, Dongping Chen, Yue Huang, Siyuan Wu, Zhengyan Fu, Yao Wan, Xiangliang Zhang, Lichao Sun

    Abstract: Large Language Models (LLMs) have achieved remarkable success across various industries due to their exceptional generative capabilities. However, for safe and effective real-world deployments, ensuring honesty and helpfulness is critical. This paper addresses the question: Can we prioritize the helpfulness of LLMs while preserving their honesty? To begin with, we establish exhaustive principles a… ▽ More

    Submitted 22 August, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

  44. arXiv:2405.20044  [pdf, other

    cs.CV

    A Point-Neighborhood Learning Framework for Nasal Endoscope Image Segmentation

    Authors: Pengyu Jie, Wanquan Liu, Chenqiang Gao, Yihui Wen, Rui He, Pengcheng Li, Jintao Zhang, Deyu Meng

    Abstract: The lesion segmentation on endoscopic images is challenging due to its complex and ambiguous features. Fully-supervised deep learning segmentation methods can receive good performance based on entirely pixel-level labeled dataset but greatly increase experts' labeling burden. Semi-supervised and weakly supervised methods can ease labeling burden, but heavily strengthen the learning difficulty. To… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 10 pages, 10 figures,

  45. arXiv:2405.19846  [pdf, other

    cs.CL cs.AI

    Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model

    Authors: Chaochen Gao, Xing Wu, Qi Fu, Songlin Hu

    Abstract: Large language models, initially pre-trained with a limited context length, can better handle longer texts by continuing training on a corpus with extended contexts. However, obtaining effective long-context data is challenging due to the scarcity and uneven distribution of long documents across different domains. To address this issue, we propose a Query-centric data synthesis method, abbreviated… ▽ More

    Submitted 19 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  46. arXiv:2405.18216  [pdf, other

    cs.SE

    A Survey on Modern Code Review: Progresses, Challenges and Opportunities

    Authors: Zezhou Yang, Cuiyun Gao, Zhaoqiang Guo, Zhenhao Li, Kui Liu, Xin Xia, Yuming Zhou

    Abstract: Over the past decade, modern code review (MCR) has been deemed as a crucial practice of software quality assurance, which is applied to improve software quality and transfer development knowledge within a software team. Despite its importance, MCR is often a complicated and time-consuming activity for practitioners. In recent years, many studies that are dedicated to the comprehension and the impr… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 62 pages

  47. arXiv:2405.15161  [pdf, other

    cs.CR cs.CV

    Are You Copying My Prompt? Protecting the Copyright of Vision Prompt for VPaaS via Watermark

    Authors: Huali Ren, Anli Yan, Chong-zhi Gao, Hongyang Yan, Zhenxin Zhang, Jin Li

    Abstract: Visual Prompt Learning (VPL) differs from traditional fine-tuning methods in reducing significant resource consumption by avoiding updating pre-trained model parameters. Instead, it focuses on learning an input perturbation, a visual prompt, added to downstream task data for making predictions. Since learning generalizable prompts requires expert design and creation, which is technically demanding… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 11 pages, 7 figures,

  48. arXiv:2405.14377  [pdf, other

    cs.LG cs.AI

    CoMERA: Computing- and Memory-Efficient Training via Rank-Adaptive Tensor Optimization

    Authors: Zi Yang, Samridhi Choudhary, Xinfeng Xie, Cao Gao, Siegfried Kunzmann, Zheng Zhang

    Abstract: Training large AI models such as deep learning recommendation systems and foundation language (or multi-modal) models costs massive GPUs and computing time. The high training cost has become only affordable to big tech companies, meanwhile also causing increasing concerns about the environmental impact. This paper presents CoMERA, a Computing- and Memory-Efficient training method via Rank-Adaptive… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  49. arXiv:2405.13816  [pdf, other

    cs.CL

    Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners

    Authors: Shimao Zhang, Changjiang Gao, Wenhao Zhu, Jiajun Chen, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang

    Abstract: Recently, Large Language Models (LLMs) have shown impressive language capabilities. While most of the existing LLMs have very unbalanced performance across different languages, multilingual alignment based on translation parallel data is an effective method to enhance the LLMs' multilingual capabilities. In this work, we discover and comprehensively investigate the spontaneous multilingual alignme… ▽ More

    Submitted 18 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  50. arXiv:2405.12195  [pdf, other

    cs.SE

    Developers' Perceptions on the Impact of ChatGPT in Software Development: A Survey

    Authors: Thiago S. Vaillant, Felipe Deveza de Almeida, Paulo Anselmo M. S. Neto, Cuiyun Gao, Jan Bosch, Eduardo Santana de Almeida

    Abstract: As Large Language Models (LLMs), including ChatGPT and analogous systems, continue to advance, their robust natural language processing capabilities and diverse applications have garnered considerable attention. Nonetheless, despite the increasing acknowledgment of the convergence of Artificial Intelligence (AI) and Software Engineering (SE), there is a lack of studies involving the impact of this… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 31 pages, 9 figures

    ACM Class: D.2.0