Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 328 results for author: Qiu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03993  [pdf, other

    cs.CL

    A Survey on Natural Language Counterfactual Generation

    Authors: Yongjie Wang, Xiaoqi Qiu, Yu Yue, Xu Guo, Zhiwei Zeng, Yuhong Feng, Zhiqi Shen

    Abstract: Natural Language Counterfactual generation aims to minimally modify a given text such that the modified text will be classified into a different class. The generated counterfactuals provide insight into the reasoning behind a model's predictions by highlighting which words significantly influence the outcomes. Additionally, they can be used to detect model fairness issues or augment the training d… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: A survey paper

    MSC Class: 68T50 ACM Class: I.2.7

  2. arXiv:2407.02837  [pdf, other

    cs.CL

    Comparing Feature-based and Context-aware Approaches to PII Generalization Level Prediction

    Authors: Kailin Zhang, Xinying Qiu

    Abstract: Protecting Personal Identifiable Information (PII) in text data is crucial for privacy, but current PII generalization methods face challenges such as uneven data distributions and limited context awareness. To address these issues, we propose two approaches: a feature-based method using machine learning to improve performance on structured inputs, and a novel context-aware framework that consider… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to IALP 2024

  3. arXiv:2406.17297  [pdf, other

    cs.CV cs.AI

    Towards Open-set Camera 3D Object Detection

    Authors: Zhuolin He, Xinrun Li, Heng Gao, Jiachen Tang, Shoumeng Qiu, Wenfu Wang, Lvjian Lu, Xuchong Qiu, Xiangyang Xue, Jian Pu

    Abstract: Traditional camera 3D object detectors are typically trained to recognize a predefined set of known object classes. In real-world scenarios, these detectors may encounter unknown objects outside the training categories and fail to identify them correctly. To address this gap, we present OS-Det3D (Open-set Camera 3D Object Detection), a two-stage training framework enhancing the ability of camera 3… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  4. arXiv:2406.16810  [pdf, other

    cs.LG cs.AI cs.CL

    PISTOL: Dataset Compilation Pipeline for Structural Unlearning of LLMs

    Authors: Xinchi Qiu, William F. Shen, Yihong Chen, Nicola Cancedda, Pontus Stenetorp, Nicholas D. Lane

    Abstract: Recently, machine unlearning, which seeks to erase specific data stored in the pre-trained or fine-tuned models, has emerged as a crucial protective measure for LLMs. However, unlearning approaches for LLMs that have been considered thus far have focused on the removal of independent data points and have not taken into account that the stored facts are logically connected to one another and form a… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.15720  [pdf, other

    cs.CL

    Scaling Laws for Fact Memorization of Large Language Models

    Authors: Xingyu Lu, Xiaonan Li, Qinyuan Cheng, Kai Ding, Xuanjing Huang, Xipeng Qiu

    Abstract: Fact knowledge memorization is crucial for Large Language Models (LLM) to generate factual and reliable responses. However, the behaviors of LLM fact memorization remain under-explored. In this paper, we analyze the scaling laws for LLM's fact knowledge and LLMs' behaviors of memorizing different types of facts. We find that LLMs' fact knowledge capacity has a linear and negative exponential law r… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  6. arXiv:2406.15279  [pdf, other

    cs.AI cs.CL

    Cross-Modality Safety Alignment

    Authors: Siyin Wang, Xingsong Ye, Qinyuan Cheng, Junwen Duan, Shimin Li, Jinlan Fu, Xipeng Qiu, Xuanjing Huang

    Abstract: As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of human life, ensuring the safety and ethical alignment of such systems is paramount. Previous studies primarily focus on single-modality threats, which may not suffice given the integrated and complex nature of cross-modality interactions. We introduce a novel safety alignment challenge called Safe Input… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  7. arXiv:2406.13990  [pdf, other

    cs.CL

    Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation

    Authors: Qin Zhu, Qingyuan Cheng, Runyu Peng, Xiaonan Li, Tengxiao Liu, Ru Peng, Xipeng Qiu, Xuanjing Huang

    Abstract: The training process of large language models (LLMs) often involves varying degrees of test data contamination. Although current LLMs are achieving increasingly better performance on various benchmarks, their performance in practical applications does not always match their benchmark results. Leakage of benchmarks can prevent the accurate assessment of LLMs' true performance. However, constructing… ▽ More

    Submitted 23 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  8. arXiv:2406.12534  [pdf, other

    cs.CL

    Unified Active Retrieval for Retrieval Augmented Generation

    Authors: Qinyuan Cheng, Xiaonan Li, Shimin Li, Qin Zhu, Zhangyue Yin, Yunfan Shao, Linyang Li, Tianxiang Sun, Hang Yan, Xipeng Qiu

    Abstract: In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal. Therefore, determining whether to retrieve is crucial for RAG, which is usually referred to as Active Retrieval. However, existing active retrieval methods face two challenges: 1. They usually rely on a single criterion, which struggles with handling various types of instru… ▽ More

    Submitted 27 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  9. arXiv:2406.11847  [pdf, other

    cs.CY cs.LG

    Integrating behavior analysis with machine learning to predict online learning performance: A scientometric review and empirical study

    Authors: Jin Yuan, Xuelan Qiu, Jinran Wu, Jiesi Guo, Weide Li, You-Gan Wang

    Abstract: The interest in predicting online learning performance using ML algorithms has been steadily increasing. We first conducted a scientometric analysis to provide a systematic review of research in this area. The findings show that most existing studies apply the ML methods without considering learning behavior patterns, which may compromise the prediction accuracy and precision of the ML methods. Th… ▽ More

    Submitted 27 March, 2024; originally announced June 2024.

    Comments: 23 pages, 12 figures, 9 tables. Submitted to Computer & Education; Authorship Contribution: Yuan: Literature review, Data curation, Methodology, Software. Qiu: Literature review, Conceptualization, Methodology, Original draft writing. Wu: Scientometric analysis, Methodology. Guo: Review and editing. Li: Comment draft, Funding seeking. Wang: Comment draft

  10. FLea: Addressing Data Scarcity and Label Skew in Federated Learning via Privacy-preserving Feature Augmentation

    Authors: Tong Xia, Abhirup Ghosh, Xinchi Qiu, Cecilia Mascolo

    Abstract: Federated Learning (FL) enables model development by leveraging data distributed across numerous edge devices without transferring local data to a central server. However, existing FL methods still face challenges when dealing with scarce and label-skewed data across devices, resulting in local model overfitting and drift, consequently hindering the performance of the global model. In response to… ▽ More

    Submitted 18 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: This work was intended as a replacement of arXiv:2312.02327 and any subsequent updates will appear there

  11. arXiv:2406.06633  [pdf, other

    cs.LG

    PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data through Contrastive Learning

    Authors: Xiaoqi Qiu, Yongjie Wang, Xu Guo, Zhiwei Zeng, Yue Yu, Yuhong Feng, Chunyan Miao

    Abstract: Counterfactually Augmented Data (CAD) involves creating new data samples by applying minimal yet sufficient modifications to flip the label of existing data samples to other classes. Training with CAD enhances model robustness against spurious features that happen to correlate with labels by spreading the casual relationships across different classes. Yet, recent research reveals that training wit… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 main conference

    MSC Class: 68T50 ACM Class: I.2; I.2.7

  12. arXiv:2406.04151  [pdf, other

    cs.AI cs.CL

    AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

    Authors: Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Dingwen Yang, Chenyang Liao, Xin Guo, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervis… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project site: https://agentgym.github.io

  13. arXiv:2406.03046  [pdf, other

    cs.NE

    When Spiking neural networks meet temporal attention image decoding and adaptive spiking neuron

    Authors: Xuerui Qiu, Zheng Luan, Zhaorui Wang, Rui-Jie Zhu

    Abstract: Spiking Neural Networks (SNNs) are capable of encoding and processing temporal information in a biologically plausible way. However, most existing SNN-based methods for image tasks do not fully exploit this feature. Moreover, they often overlook the role of adaptive threshold in spiking neurons, which can enhance their dynamic behavior and learning ability. To address these issues, we propose a no… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by ICLR23 tiny paper

  14. arXiv:2406.02974  [pdf

    cs.CL

    Readability-guided Idiom-aware Sentence Simplification (RISS) for Chinese

    Authors: Jingshen Zhang, Xinglu Chen, Xinying Qiu, Zhimin Wang, Wenhe Feng

    Abstract: Chinese sentence simplification faces challenges due to the lack of large-scale labeled parallel corpora and the prevalence of idioms. To address these challenges, we propose Readability-guided Idiom-aware Sentence Simplification (RISS), a novel framework that combines data augmentation techniques with lexcial simplification. RISS introduces two key components: (1) Readability-guided Paraphrase Se… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted to the 23rd China National Conference on Computational Linguistics (CCL 2024)

  15. arXiv:2405.20882  [pdf, other

    cs.LG

    Sheaf HyperNetworks for Personalized Federated Learning

    Authors: Bao Nguyen, Lorenzo Sani, Xinchi Qiu, Pietro Liò, Nicholas D. Lane

    Abstract: Graph hypernetworks (GHNs), constructed by combining graph neural networks (GNNs) with hypernetworks (HNs), leverage relational data across various domains such as neural architecture search, molecular property prediction and federated learning. Despite GNNs and HNs being individually successful, we show that GHNs present problems compromising their performance, such as over-smoothing and heteroph… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 25 pages, 12 figures, 7 tables, pre-print under review

  16. arXiv:2405.16466  [pdf, other

    cs.NE

    High-Performance Temporal Reversible Spiking Neural Networks with $O(L)$ Training Memory and $O(1)$ Inference Cost

    Authors: JiaKui Hu, Man Yao, Xuerui Qiu, Yuhong Chou, Yuxuan Cai, Ning Qiao, Yonghong Tian, Bo XU, Guoqi Li

    Abstract: Multi-timestep simulation of brain-inspired Spiking Neural Networks (SNNs) boost memory requirements during training and increase inference energy cost. Current training methods cannot simultaneously solve both training and inference dilemmas. This work proposes a novel Temporal Reversible architecture for SNNs (T-RevSNN) to jointly address the training and inference challenges by altering the for… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML2024

  17. arXiv:2405.16258  [pdf, other

    cs.LG cs.AI eess.SY

    USD: Unsupervised Soft Contrastive Learning for Fault Detection in Multivariate Time Series

    Authors: Hong Liu, Xiuxiu Qiu, Yiming Shi, Zelin Zang

    Abstract: Unsupervised fault detection in multivariate time series is critical for maintaining the integrity and efficiency of complex systems, with current methodologies largely focusing on statistical and machine learning techniques. However, these approaches often rest on the assumption that data distributions conform to Gaussian models, overlooking the diversity of patterns that can manifest in both nor… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 19 pages, 7 figures, under review

  18. arXiv:2405.13868  [pdf, other

    cs.LG cs.CL

    Automatically Identifying Local and Global Circuits with Linear Computation Graphs

    Authors: Xuyang Ge, Fukang Zhu, Wentao Shu, Junxuan Wang, Zhengfu He, Xipeng Qiu

    Abstract: Circuit analysis of any certain model behavior is a central task in mechanistic interpretability. We introduce our circuit discovery pipeline with sparse autoencoders (SAEs) and a variant called skip SAEs. With these two modules inserted into the model, the model's computation graph with respect to OV and MLP circuits becomes strictly linear. Our methods do not require linear approximation to comp… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  19. arXiv:2405.13845  [pdf, other

    cs.CL cs.AI

    Semantic Density: Uncertainty Quantification in Semantic Space for Large Language Models

    Authors: Xin Qiu, Risto Miikkulainen

    Abstract: With the widespread application of Large Language Models (LLMs) to various domains, concerns regarding the trustworthiness of LLMs in safety-critical scenarios have been raised, due to their unpredictable tendency to hallucinate and generate misinformation. Existing LLMs do not have an inherent functionality to provide the users with an uncertainty metric for each response it generates, making it… ▽ More

    Submitted 25 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Fix institution name

  20. arXiv:2405.13672  [pdf, other

    cs.CV

    Advancing Spiking Neural Networks towards Multiscale Spatiotemporal Interaction Learning

    Authors: Yimeng Shan, Malu Zhang, Rui-jie Zhu, Xuerui Qiu, Jason K. Eshraghian, Haicheng Qu

    Abstract: Recent advancements in neuroscience research have propelled the development of Spiking Neural Networks (SNNs), which not only have the potential to further advance neuroscience research but also serve as an energy-efficient alternative to Artificial Neural Networks (ANNs) due to their spike-driven characteristics. However, previous studies often neglected the multiscale information and its spatiot… ▽ More

    Submitted 27 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  21. arXiv:2405.13089  [pdf, other

    cs.LG

    SEGAN: semi-supervised learning approach for missing data imputation

    Authors: Xiaohua Pan, Weifeng Wu, Peiran Liu, Zhen Li, Peng Lu, Peijian Cao, Jianfeng Zhang, Xianfei Qiu, YangYang Wu

    Abstract: In many practical real-world applications, data missing is a very common phenomenon, making the development of data-driven artificial intelligence theory and technology increasingly difficult. Data completion is an important method for missing data preprocessing. Most existing miss-ing data completion models directly use the known information in the missing data set but ignore the impact of the da… ▽ More

    Submitted 12 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  22. arXiv:2405.12939  [pdf, other

    cs.CL

    Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models

    Authors: Zhangyue Yin, Qiushi Sun, Qipeng Guo, Zhiyuan Zeng, Xiaonan Li, Tianxiang Sun, Cheng Chang, Qinyuan Cheng, Ding Wang, Xiaofeng Mou, Xipeng Qiu, XuanJing Huang

    Abstract: Recent advancements in Chain-of-Thought prompting have facilitated significant breakthroughs for Large Language Models (LLMs) in complex reasoning tasks. Current research enhances the reasoning performance of LLMs by sampling multiple reasoning chains and ensembling based on the answer frequency. However, this approach fails in scenarios where the correct answers are in the minority. We identify t… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 17 pages, 14 figures, accepted by LREC-COLING 2024

  23. arXiv:2405.10853  [pdf, other

    cs.LG cs.AI cs.DC

    The Future of Large Language Model Pre-training is Federated

    Authors: Lorenzo Sani, Alex Iacob, Zeyu Cao, Bill Marino, Yan Gao, Tomas Paulik, Wanru Zhao, William F. Shen, Preslav Aleksandrov, Xinchi Qiu, Nicholas D. Lane

    Abstract: Generative pre-trained large language models (LLMs) have demonstrated impressive performance over a wide range of tasks, thanks to the unprecedented amount of data they have been trained on. As established scaling laws indicate, LLMs' future performance improvement depends on the amount of computing and data sources we can leverage for pre-training. Federated learning (FL) has the potential to unl… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 4 figures, pre-print

  24. arXiv:2405.02004  [pdf, other

    cs.CV

    M${^2}$Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation

    Authors: Yingshuang Zou, Yikang Ding, Xi Qiu, Haoqian Wang, Haotian Zhang

    Abstract: This paper presents a novel self-supervised two-frame multi-camera metric depth estimation network, termed M${^2}$Depth, which is designed to predict reliable scale-aware surrounding depth in autonomous driving. Unlike the previous works that use multi-view images from a single time-step or multiple time-step images from a single camera, M${^2}$Depth takes temporally adjacent two-frame images from… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  25. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  26. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  27. arXiv:2404.11864  [pdf, other

    cs.CV

    Progressive Multi-modal Conditional Prompt Tuning

    Authors: Xiaoyu Qiu, Hao Feng, Yuechen Wang, Wengang Zhou, Houqiang Li

    Abstract: Pre-trained vision-language models (VLMs) have shown remarkable generalization capabilities via prompting, which leverages VLMs as knowledge bases to extract information beneficial for downstream tasks. However, existing methods primarily employ uni-modal prompting, which only engages a uni-modal branch, failing to simultaneously adjust vision-language (V-L) features. Additionally, the one-pass fo… ▽ More

    Submitted 24 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  28. arXiv:2404.05600  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechAlign: Aligning Speech Generation to Human Preferences

    Authors: Dong Zhang, Zhaowei Li, Shimin Li, Xin Zhang, Pengyu Wang, Yaqian Zhou, Xipeng Qiu

    Abstract: Speech language models have significantly advanced in generating realistic speech, with neural codec language models standing out. However, the integration of human feedback to align speech outputs to human preferences is often neglected. This paper addresses this gap by first analyzing the distribution gap in codec language models, highlighting how it leads to discrepancies between the training a… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Work in progress

  29. arXiv:2404.03113  [pdf, other

    cs.AR

    QED: Scalable Verification of Hardware Memory Consistency

    Authors: Gokulan Ravi, Xiaokang Qiu, Mithuna Thottethodi, T. N. Vijaykumar

    Abstract: Memory consistency model (MCM) issues in out-of-order-issue microprocessor-based shared-memory systems are notoriously non-intuitive and a source of hardware design bugs. Prior hardware verification work is limited to in-order-issue processors, to proving the correctness only of some test cases, or to bounded verification that does not scale in practice beyond 7 instructions across all threads. Be… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 13 pages, 8 figures

  30. arXiv:2404.02655  [pdf, other

    cs.CL

    Calibrating the Confidence of Large Language Models by Eliciting Fidelity

    Authors: Mozhi Zhang, Mianqiu Huang, Rundong Shi, Linsen Guo, Chong Peng, Peng Yan, Yaqian Zhou, Xipeng Qiu

    Abstract: Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the \textit{Uncertainty} about the question and the… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 17 pages, 13 figures

  31. arXiv:2403.20150  [pdf, other

    cs.LG cs.AI cs.CY

    TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods

    Authors: Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng, Bin Yang

    Abstract: Time series are generated in diverse domains such as economic, traffic, health, and energy, where forecasting of future values has numerous important applications. Not surprisingly, many forecasting methods are being proposed. To ensure progress, it is essential to be able to study and compare such methods empirically in a comprehensive and reliable manner. To achieve this, we propose TFB, an auto… ▽ More

    Submitted 18 June, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: Directly accepted by PVLDB 2024

  32. arXiv:2403.17297  [pdf, other

    cs.CL cs.AI

    InternLM2 Technical Report

    Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

    Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  33. arXiv:2403.16952  [pdf, other

    cs.CL cs.AI cs.LG

    Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance

    Authors: Jiasheng Ye, Peiju Liu, Tianxiang Sun, Yunhua Zhou, Jun Zhan, Xipeng Qiu

    Abstract: Pretraining data of large language models composes multiple domains (e.g., web texts, academic papers, codes), whose mixture proportions crucially impact the competence of outcome models. While existing endeavors rely on heuristics or qualitative strategies to tune the proportions, we discover the quantitative predictability of model performance regarding the mixture proportions in function forms,… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  34. arXiv:2403.15100  [pdf, other

    cs.RO cs.AI

    Subequivariant Reinforcement Learning Framework for Coordinated Motion Control

    Authors: Haoyu Wang, Xiaoyu Tan, Xihe Qiu, Chao Qu

    Abstract: Effective coordination is crucial for motion control with reinforcement learning, especially as the complexity of agents and their motions increases. However, many existing methods struggle to account for the intricate dependencies between joints. We introduce CoordiGraph, a novel architecture that leverages subequivariant principles from physics to enhance coordination of motion control with rein… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 7 pages, 7 figures, 2024 IEEE International Conference on Robotics and Automation

  35. arXiv:2403.14734  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond

    Authors: Qiushi Sun, Zhirui Chen, Fangzhi Xu, Kanzhi Cheng, Chang Ma, Zhangyue Yin, Jianing Wang, Chengcheng Han, Renyu Zhu, Shuai Yuan, Qipeng Guo, Xipeng Qiu, Pengcheng Yin, Xiaoli Li, Fei Yuan, Lingpeng Kong, Xiang Li, Zhiyong Wu

    Abstract: Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronol… ▽ More

    Submitted 23 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: 64 pages, 6 figures, 10 tables, 692 references

  36. arXiv:2403.14137  [pdf, other

    cs.CV cs.LG

    SynerMix: Synergistic Mixup Solution for Enhanced Intra-Class Cohesion and Inter-Class Separability in Image Classification

    Authors: Ye Xu, Ya Gao, Xiaorong Qiu, Yang Chen, Ying Ji

    Abstract: To address the issues of MixUp and its variants (e.g., Manifold MixUp) in image classification tasks-namely, their neglect of mixing within the same class (intra-class mixup) and their inadequacy in enhancing intra-class cohesion through their mixing operations-we propose a novel mixup method named SynerMix-Intra and, building upon this, introduce a synergistic mixup solution named SynerMix. Syner… ▽ More

    Submitted 24 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: 25 pages,12 figures

  37. arXiv:2403.12407  [pdf, other

    cs.CL

    Cross-Lingual Transfer for Natural Language Inference via Multilingual Prompt Translator

    Authors: Xiaoyu Qiu, Yuechen Wang, Jiaxin Shi, Wengang Zhou, Houqiang Li

    Abstract: Based on multilingual pre-trained models, cross-lingual transfer with prompt learning has shown promising effectiveness, where soft prompt learned in a source language is transferred to target languages for downstream tasks, particularly in the low-resource scenario. To efficiently transfer soft prompt, we propose a novel framework, Multilingual Prompt Translator (MPT), where a multilingual prompt… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 6 pages, 5 figures, conference

  38. arXiv:2403.09631  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    3D-VLA: A 3D Vision-Language-Action Generative World Model

    Authors: Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan

    Abstract: Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the broader realm of the 3D physical world. Furthermore, they perform action prediction by learning a direct mapping from perception to action, neglecting the vast dynamics of the world and the relations between actions and dynamics. In contrast, human beings are endowed with world models that depict imagination… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Project page: https://vis-www.cs.umass.edu/3dvla/

  39. arXiv:2403.06510  [pdf, other

    cs.CV

    Skeleton Supervised Airway Segmentation

    Authors: Mingyue Zhao, Han Li, Li Fan, Shiyuan Liu, Xiaolan Qiu, S. Kevin Zhou

    Abstract: Fully-supervised airway segmentation has accomplished significant triumphs over the years in aiding pre-operative diagnosis and intra-operative navigation. However, full voxel-level annotation constitutes a labor-intensive and time-consuming task, often plagued by issues such as missing branches, branch annotation discontinuity, or erroneous edge delineation. label-efficient solutions for airway e… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  40. arXiv:2403.06243  [pdf, other

    cs.CV

    BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering

    Authors: Xinmin Qiu, Congying Han, Zicheng Zhang, Bonan Li, Tiande Guo, Pingyu Wang, Xuecheng Nie

    Abstract: Developing blind video deflickering (BVD) algorithms to enhance video temporal consistency, is gaining importance amid the flourish of image processing and video generation. However, the intricate nature of video data complicates the training of deep learning methods, leading to high resource consumption and instability, notably under severe lighting flicker. This underscores the critical need for… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  41. arXiv:2403.03558  [pdf, other

    cs.CL

    Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem

    Authors: Yuhong Sun, Zhangyue Yin, Qipeng Guo, Jiawen Wu, Xipeng Qiu, Hui Zhao

    Abstract: Large language models (LLMs) are highly effective in various natural language processing (NLP) tasks. However, they are susceptible to producing unreliable conjectures in ambiguous contexts called hallucination. This paper presents a new method for evaluating LLM hallucination in Question Answering (QA) based on the unanswerable math word problem (MWP). To support this approach, we innovatively de… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 11 pages, 8 figures, accepted by LREC-Coling 2024

  42. arXiv:2403.02757  [pdf, other

    cs.CL

    In-Memory Learning: A Declarative Learning Framework for Large Language Models

    Authors: Bo Wang, Tianxiang Sun, Hang Yan, Siyin Wang, Qingyuan Cheng, Xipeng Qiu

    Abstract: The exploration of whether agents can align with their environment without relying on human-labeled data presents an intriguing research topic. Drawing inspiration from the alignment process observed in intelligent organisms, where declarative memory plays a pivotal role in summarizing past experiences, we propose a novel learning framework. The agents adeptly distill insights from past experience… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  43. arXiv:2403.01456  [pdf

    cs.CL cs.AI cs.CY

    Controlling Cloze-test Question Item Difficulty with PLM-based Surrogate Models for IRT Assessment

    Authors: Jingshen Zhang, Jiajun Xie, Xinying Qiu

    Abstract: Item difficulty plays a crucial role in adaptive testing. However, few works have focused on generating questions of varying difficulty levels, especially for multiple-choice (MC) cloze tests. We propose training pre-trained language models (PLMs) as surrogate models to enable item response theory (IRT) assessment, avoiding the need for human test subjects. We also propose two strategies to contro… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  44. arXiv:2403.00793  [pdf, other

    cs.IR cs.LG

    Ad Recommendation in a Collapsed and Entangled World

    Authors: Junwei Pan, Wei Xue, Ximei Wang, Haibin Yu, Xun Liu, Shijie Quan, Xueming Qiu, Dapeng Liu, Lei Xiao, Jie Jiang

    Abstract: In this paper, we present an industry ad recommendation system, paying attention to the challenges and practices of learning appropriate representations. Our study begins by showcasing our approaches to preserving priors when encoding features of diverse types into embedding representations. Specifically, we address sequence features, numeric features, pre-trained embedding features, as well as sp… ▽ More

    Submitted 22 February, 2024; originally announced March 2024.

  45. arXiv:2403.00270  [pdf, other

    cs.NE cs.CV

    Event-Driven Learning for Spiking Neural Networks

    Authors: Wenjie Wei, Malu Zhang, Jilin Zhang, Ammar Belatreche, Jibin Wu, Zijing Xu, Xuerui Qiu, Hong Chen, Yang Yang, Haizhou Li

    Abstract: Brain-inspired spiking neural networks (SNNs) have gained prominence in the field of neuromorphic computing owing to their low energy consumption during feedforward inference on neuromorphic hardware. However, it remains an open challenge how to effectively benefit from the sparse event-driven property of SNNs to minimize backpropagation learning costs. In this paper, we conduct a comprehensive ex… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  46. arXiv:2402.17463  [pdf, other

    cs.CL

    Training-Free Long-Context Scaling of Large Language Models

    Authors: Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong

    Abstract: The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length. Given the expensive overhead of finetuning large-scale models with longer sequences, we propose Dual Chunk Attention (DCA), which enables Llama2 70B to support context windows of more than 100k tokens without continual training. By… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  47. arXiv:2402.17433  [pdf, other

    cs.CL

    Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder

    Authors: Jiaqi Wang, Zhenxi Song, Zhengyu Ma, Xipeng Qiu, Min Zhang, Zhiguo Zhang

    Abstract: Reconstructing natural language from non-invasive electroencephalography (EEG) holds great promise as a language decoding technology for brain-computer interfaces (BCIs). However, EEG-based language decoding is still in its nascent stages, facing several technical issues such as: 1) Absence of a hybrid strategy that can effectively integrate cross-modality (between EEG and text) self-learning with… ▽ More

    Submitted 10 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 8 pages (excluding references), accepted by ACL 2024 Main Conference

  48. arXiv:2402.16319  [pdf, other

    cs.CL

    Data-freeWeight Compress and Denoise for Large Language Models

    Authors: Runyu Peng, Yunhua Zhou, Qipeng Guo, Yang Gao, Hang Yan, Xipeng Qiu, Dahua Lin

    Abstract: Large Language Models (LLMs) are reshaping the research landscape in artificial intelligence, particularly as model parameters scale up significantly, unlocking remarkable capabilities across various domains. Nevertheless, the scalability of model parameters faces constraints due to limitations in GPU memory and computational speed. To address these constraints, various weight compression methods… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  49. arXiv:2402.15745  [pdf, other

    cs.CL cs.AI cs.CV

    GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation

    Authors: Yi Zong, Xipeng Qiu

    Abstract: The Large Vision-Language Models (LVLMs) have demonstrated great abilities in image perception and language understanding. However, existing multimodal benchmarks focus on primary perception abilities and commonsense knowledge which are insufficient to reflect the comprehensive capabilities of LVLMs. We propose GAOKAO-MM, a multimodal benchmark based on the Chinese College Entrance Examination (GA… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  50. arXiv:2402.15147  [pdf, other

    cs.CR cs.LG

    TREC: APT Tactic / Technique Recognition via Few-Shot Provenance Subgraph Learning

    Authors: Mingqi Lv, HongZhe Gao, Xuebo Qiu, Tieming Chen, Tiantian Zhu

    Abstract: APT (Advanced Persistent Threat) with the characteristics of persistence, stealth, and diversity is one of the greatest threats against cyber-infrastructure. As a countermeasure, existing studies leverage provenance graphs to capture the complex relations between system entities in a host for effective APT detection. In addition to detecting single attack events as most existing work does, underst… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.