Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 115 results for author: Shang, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16144  [pdf, other

    cs.CL

    Chain-of-Probe: Examing the Necessity and Accuracy of CoT Step-by-Step

    Authors: Zezhong Wang, Xingshan Zeng, Weiwen Liu, Yufei Wang, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong

    Abstract: Current research found the issue of Early Answering in large language models (LLMs), where the models already have an answer before generating the Chain-of-Thought (CoT). This phenomenon suggests a potential lack of necessary dependency between the predicted answer and the reasoning process. Consequently, two important questions arise: (1) Is CoT still necessary if the model already has an answer?… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2406.10278  [pdf, other

    cs.CL cs.AI

    Prompt-Based Length Controlled Generation with Multiple Control Types

    Authors: Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: Large language models (LLMs) have attracted great attention given their strong performance on a wide range of NLP tasks. In practice, users often expect generated texts to fall within a specific length range, making length controlled generation an important topic, especially for GPT-style models. Existing length control methods mostly focus on a simple control type of "equal to" a target length. D… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 findings. arXiv admin note: text overlap with arXiv:2308.12030

  3. arXiv:2406.09815  [pdf, other

    cs.CL cs.AI

    Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments

    Authors: Zhenrui Yue, Huimin Zeng, Lanyu Shang, Yifan Liu, Yang Zhang, Dong Wang

    Abstract: The rapid propagation of misinformation poses substantial risks to public interest. To combat misinformation, large language models (LLMs) are adapted to automatically verify claim credibility. Nevertheless, existing methods heavily rely on the embedded knowledge within LLMs and / or black-box APIs for evidence collection, leading to subpar performance with smaller LLMs or upon unreliable context.… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  4. arXiv:2405.19010  [pdf, other

    cs.CL cs.AI cs.IR

    Evaluating the External and Parametric Knowledge Fusion of Large Language Models

    Authors: Hao Zhang, Yuyang Zhang, Xiaoguang Li, Wenxuan Shi, Haonan Xu, Huanshuo Liu, Yasheng Wang, Lifeng Shang, Qun Liu, Yong Liu, Ruiming Tang

    Abstract: Integrating external knowledge into large language models (LLMs) presents a promising solution to overcome the limitations imposed by their antiquated and static parametric memory. Prior studies, however, have tended to over-reliance on external knowledge, underestimating the valuable contributions of an LLMs' intrinsic parametric knowledge. The efficacy of LLMs in blending external and parametric… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 15 pages, 3 figures, 3 tables

  5. arXiv:2405.18347  [pdf, other

    cs.LG

    Dataset Growth

    Authors: Ziheng Qin, Zhaopan Xu, Yukun Zhou, Zangwei Zheng, Zebang Cheng, Hao Tang, Lei Shang, Baigui Sun, Xiaojiang Peng, Radu Timofte, Hongxun Yao, Kai Wang, Yang You

    Abstract: Deep learning benefits from the growing abundance of available data. Meanwhile, efficiently dealing with the growing data scale has become a challenge. Data publicly available are from different sources with various qualities, and it is impractical to do manual cleaning against noise and redundancy given today's data scale. There are existing techniques for cleaning/selecting the collected data. H… ▽ More

    Submitted 23 July, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.20087 by other authors

  6. arXiv:2405.07527  [pdf, other

    cs.LG cs.AI

    Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models

    Authors: Yubin Shi, Yixuan Chen, Mingzhi Dong, Xiaochen Yang, Dongsheng Li, Yujiang Wang, Robert P. Dick, Qin Lv, Yingying Zhao, Fan Yang, Tun Lu, Ning Gu, Li Shang

    Abstract: Despite their prevalence in deep-learning communities, over-parameterized models convey high demands of computational costs for proper training. This work studies the fine-grained, modular-level learning dynamics of over-parameterized models to attain a more efficient and fruitful training strategy. Empirical evidence reveals that when scaling down into network modules, such as heads in self-atten… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted at NeurIPS 2023

  7. arXiv:2403.14952  [pdf, other

    cs.CL cs.AI

    Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation

    Authors: Zhenrui Yue, Huimin Zeng, Yimeng Lu, Lanyu Shang, Yang Zhang, Dong Wang

    Abstract: The proliferation of online misinformation has posed significant threats to public interest. While numerous online users actively participate in the combat against misinformation, many of such responses can be characterized by the lack of politeness and supporting facts. As a solution, text generation approaches are proposed to automatically produce counter-misinformation responses. Nevertheless,… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted to NAACL 2024

  8. arXiv:2403.06775  [pdf, other

    cs.CV

    FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

    Authors: Pengchong Qiao, Lei Shang, Chang Liu, Baigui Sun, Xiangyang Ji, Jie Chen

    Abstract: Subject-driven generation has garnered significant interest recently due to its ability to personalize text-to-image generation. Typical works focus on learning the new subject's private attributes. However, an important fact has not been taken seriously that a subject is not an isolated new concept but should be a specialization of a certain category in the pre-trained model. This results in the… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: accepted by CVPR2024

  9. arXiv:2402.14488  [pdf, other

    cs.CL

    Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer

    Authors: Xinshuo Hu, Baotian Hu, Dongfang Li, Xiaoguang Li, Lifeng Shang

    Abstract: The present study introduces the knowledge-augmented generator, which is specifically designed to produce information that remains grounded in contextual knowledge, regardless of alterations in the context. Previous research has predominantly focused on examining hallucinations stemming from static input, such as in the domains of summarization or machine translation. However, our investigation de… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: LREC-Coling 2024

  10. arXiv:2402.11905  [pdf, other

    cs.CL

    Learning to Edit: Aligning LLMs with Knowledge Editing

    Authors: Yuxin Jiang, Yufei Wang, Chuhan Wu, Wanjun Zhong, Xingshan Zeng, Jiahui Gao, Liangyou Li, Xin Jiang, Lifeng Shang, Ruiming Tang, Qun Liu, Wei Wang

    Abstract: Knowledge editing techniques, aiming to efficiently modify a minor proportion of knowledge in large language models (LLMs) without negatively impacting performance across other inputs, have garnered widespread attention. However, existing methods predominantly rely on memorizing the updated knowledge, impeding LLMs from effectively combining the new knowledge with their inherent knowledge when ans… ▽ More

    Submitted 5 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 17 pages, 8 figures, 9 tables. ACL 2024 main camera-ready version

  11. arXiv:2402.08426  [pdf, other

    cs.IR cs.LG

    Frequency-aware Graph Signal Processing for Collaborative Filtering

    Authors: Jiafeng Xia, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, Li Shang, Ning Gu

    Abstract: Graph Signal Processing (GSP) based recommendation algorithms have recently attracted lots of attention due to its high efficiency. However, these methods failed to consider the importance of various interactions that reflect unique user/item characteristics and failed to utilize user and item high-order neighborhood information to model user preference, thus leading to sub-optimal performance. To… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  12. arXiv:2401.17167  [pdf, other

    cs.CL

    Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios

    Authors: Shijue Huang, Wanjun Zhong, Jianqiao Lu, Qi Zhu, Jiahui Gao, Weiwen Liu, Yutai Hou, Xingshan Zeng, Yasheng Wang, Lifeng Shang, Xin Jiang, Ruifeng Xu, Qun Liu

    Abstract: The recent trend of using Large Language Models (LLMs) as tool agents in real-world applications underscores the necessity for comprehensive evaluations of their capabilities, particularly in complex scenarios involving planning, creating, and using tools. However, existing benchmarks typically focus on simple synthesized queries that do not reflect real-world complexity, thereby offering limited… ▽ More

    Submitted 3 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted by ACL2024 Findings

  13. arXiv:2401.16745  [pdf, other

    cs.CL

    MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models

    Authors: Wai-Chung Kwan, Xingshan Zeng, Yuxin Jiang, Yufei Wang, Liangyou Li, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong

    Abstract: Large language models (LLMs) are increasingly relied upon for complex multi-turn conversations across diverse real-world applications. However, existing benchmarks predominantly focus on single-turn evaluations, overlooking the models' capabilities in multi-turn interactions. To address this gap, we introduce MT-Eval, a comprehensive benchmark designed to evaluate multi-turn conversational abiliti… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Code and data are available at https://github.com/KwanWaiChung/MT-Eval

  14. arXiv:2401.15670  [pdf, other

    cs.CL cs.AI cs.LG

    YODA: Teacher-Student Progressive Learning for Language Models

    Authors: Jianqiao Lu, Wanjun Zhong, Yufei Wang, Zhijiang Guo, Qi Zhu, Wenyong Huang, Yanlin Wang, Fei Mi, Baojun Wang, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: Although large language models (LLMs) have demonstrated adeptness in a range of tasks, they still lag behind human learning efficiency. This disparity is often linked to the inherent human capacity to learn from basic examples, gradually generalize and handle more complex problems, and refine their skills with continuous feedback. Inspired by this, this paper introduces YODA, a novel teacher-stude… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: 14 pages, 4 figures, 3 tables

  15. arXiv:2401.15042  [pdf, other

    cs.CL cs.AI

    PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models

    Authors: Haochen Tan, Zhijiang Guo, Zhan Shi, Lu Xu, Zhili Liu, Yunlong Feng, Xiaoguang Li, Yasheng Wang, Lifeng Shang, Qun Liu, Linqi Song

    Abstract: Large Language Models (LLMs) have succeeded remarkably in understanding long-form contents. However, exploring their capability for generating long-form contents, such as reports and articles, has been relatively unexplored and inadequately assessed by existing benchmarks. The prevalent evaluation methods, which predominantly rely on crowdsourcing, are recognized for their labor-intensive nature a… ▽ More

    Submitted 4 June, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Accepted to ACL 2024 main conference

  16. arXiv:2401.09192  [pdf, other

    cs.LG cs.AI

    Preparing Lessons for Progressive Training on Language Models

    Authors: Yu Pan, Ye Yuan, Yichun Yin, Jiaxin Shi, Zenglin Xu, Ming Zhang, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: The rapid progress of Transformers in artificial intelligence has come at the cost of increased resource consumption and greenhouse gas emissions due to growing model sizes. Prior work suggests using pretrained small models to improve training efficiency, but this approach may not be suitable for new model structures. On the other hand, training from scratch can be slow, and progressively stacking… ▽ More

    Submitted 10 February, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

  17. arXiv:2312.01700  [pdf, other

    cs.CL cs.AI

    Data Management For Large Language Models: A Survey

    Authors: Zige Wang, Wanjun Zhong, Yufei Wang, Qi Zhu, Fei Mi, Baojun Wang, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: Data plays a fundamental role in the training of Large Language Models (LLMs). Effective data management, particularly in the formulation of a well-suited training dataset, holds significance for enhancing model performance and improving training efficiency during pretraining and supervised fine-tuning phases. Despite the considerable importance of data management, the current research community s… ▽ More

    Submitted 25 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Work in progress

  18. arXiv:2311.18251  [pdf, other

    cs.HC

    Can Large Language Models Be Good Companions? An LLM-Based Eyewear System with Conversational Common Ground

    Authors: Zhenyu Xu, Hailin Xu, Zhouyang Lu, Yingying Zhao, Rui Zhu, Yujiang Wang, Mingzhi Dong, Yuhu Chang, Qin Lv, Robert P. Dick, Fan Yang, Tun Lu, Ning Gu, Li Shang

    Abstract: Developing chatbots as personal companions has long been a goal of artificial intelligence researchers. Recent advances in Large Language Models (LLMs) have delivered a practical solution for endowing chatbots with anthropomorphic language capabilities. However, it takes more than LLMs to enable chatbots that can act as companions. Humans use their understanding of individual personalities to driv… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 36 pages, 25 figures, Under review at ACM IMWUT

  19. arXiv:2311.04625  [pdf, other

    cs.IR

    A Comprehensive Summarization and Evaluation of Feature Refinement Modules for CTR Prediction

    Authors: Fangye Wang, Hansu Gu, Dongsheng Li, Tun Lu, Peng Zhang, Li Shang, Ning Gu

    Abstract: Click-through rate (CTR) prediction is widely used in academia and industry. Most CTR tasks fall into a feature embedding \& feature interaction paradigm, where the accuracy of CTR prediction is mainly improved by designing practical feature interaction structures. However, recent studies have argued that the fixed feature embedding learned only through the embedding layer limits the performance o… ▽ More

    Submitted 1 December, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

  20. arXiv:2310.20410  [pdf, other

    cs.CL

    FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models

    Authors: Yuxin Jiang, Yufei Wang, Xingshan Zeng, Wanjun Zhong, Liangyou Li, Fei Mi, Lifeng Shang, Xin Jiang, Qun Liu, Wei Wang

    Abstract: The ability to follow instructions is crucial for Large Language Models (LLMs) to handle various real-world applications. Existing benchmarks primarily focus on evaluating pure response quality, rather than assessing whether the response follows constraints stated in the instruction. To fill this research gap, in this paper, we propose FollowBench, a Multi-level Fine-grained Constraints Following… ▽ More

    Submitted 5 June, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: 22 pages, 11 figures, 16 tables. ACL 2024 main camera-ready version

  21. arXiv:2310.19240  [pdf, other

    cs.CL

    M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models

    Authors: Wai-Chung Kwan, Xingshan Zeng, Yufei Wang, Yusen Sun, Liangyou Li, Lifeng Shang, Qun Liu, Kam-Fai Wong

    Abstract: Managing long sequences has become an important and necessary feature for large language models (LLMs). However, it is still an open question of how to comprehensively and systematically evaluate the long-sequence capability of LLMs. One of the reasons is that conventional and widely-used benchmarks mainly consist of short sequences. In this paper, we propose M4LE, a Multi-ability, Multi-range, Mu… ▽ More

    Submitted 27 July, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

    Comments: Code and data are available at https://github.com/KwanWaiChung/M4LE

  22. arXiv:2310.10699  [pdf, other

    cs.LG cs.AI

    Reusing Pretrained Models by Multi-linear Operators for Efficient Training

    Authors: Yu Pan, Ye Yuan, Yichun Yin, Zenglin Xu, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: Training large models from scratch usually costs a substantial amount of resources. Towards this problem, recent studies such as bert2BERT and LiGO have reused small pretrained models to initialize a large model (termed the ``target model''), leading to a considerable acceleration in training. Despite the successes of these previous studies, they grew pretrained models by mapping partial weights o… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted in NeurIPS 2023

  23. arXiv:2310.10477  [pdf, other

    cs.CL cs.AI cs.LG

    Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

    Authors: Kai Chen, Chunwei Wang, Kuo Yang, Jianhua Han, Lanqing Hong, Fei Mi, Hang Xu, Zhengying Liu, Wenyong Huang, Zhenguo Li, Dit-Yan Yeung, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: The rapid development of large language models (LLMs) has not only provided numerous opportunities but also presented significant challenges. This becomes particularly evident when LLMs inadvertently generate harmful or toxic content, either unintentionally or because of intentional inducement. Existing alignment methods usually direct LLMs toward the favorable outcomes by utilizing human-annotate… ▽ More

    Submitted 16 February, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted by ICLR 2024

  24. arXiv:2310.08372  [pdf, other

    cs.CL

    Improving Factual Consistency for Knowledge-Grounded Dialogue Systems via Knowledge Enhancement and Alignment

    Authors: Boyang Xue, Weichao Wang, Hongru Wang, Fei Mi, Rui Wang, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong

    Abstract: Pretrained language models (PLMs) based knowledge-grounded dialogue systems are prone to generate responses that are factually inconsistent with the provided knowledge source. In such inconsistent responses, the dialogue models fail to accurately express the external knowledge they rely upon. Inspired by previous work which identified that feed-forward networks (FFNs) within Transformers are respo… ▽ More

    Submitted 3 November, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: EMNLP2023 Findings

  25. arXiv:2310.04960  [pdf, other

    cs.CL

    Exploring the Usage of Chinese Pinyin in Pretraining

    Authors: Baojun Wang, Kun Xu, Lifeng Shang

    Abstract: Unlike alphabetic languages, Chinese spelling and pronunciation are different. Both characters and pinyin take an important role in Chinese language understanding. In Chinese NLP tasks, we almost adopt characters or words as model input, and few works study how to use pinyin. However, pinyin is essential in many scenarios, such as error correction and fault tolerance for ASR-introduced errors. Mos… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  26. arXiv:2310.00533  [pdf, other

    cs.CL cs.AI cs.LG

    SELF: Self-Evolution with Language Feedback

    Authors: Jianqiao Lu, Wanjun Zhong, Wenyong Huang, Yufei Wang, Qi Zhu, Fei Mi, Baojun Wang, Weichao Wang, Xingshan Zeng, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: Large Language Models (LLMs) have demonstrated remarkable versatility across various domains. To further advance LLMs, we propose 'SELF' (Self-Evolution with Language Feedback), a novel approach that enables LLMs to self-improve through self-reflection, akin to human learning processes. SELF initiates with a meta-skill learning process that equips the LLMs with capabilities for self-feedback and s… ▽ More

    Submitted 1 February, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

    Comments: 20 pages, 4 figures, 11 tables

  27. arXiv:2308.14256  [pdf, other

    cs.CV cs.AI

    FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content

    Authors: Yang Liu, Cheng Yu, Lei Shang, Yongyi He, Ziheng Wu, Xingjun Wang, Chao Xu, Haoyu Xie, Weida Wang, Yuze Zhao, Lin Zhu, Chen Cheng, Weitao Chen, Yuan Yao, Wenmeng Zhou, Jiaqi Xu, Qiang Wang, Yingda Chen, Xuansong Xie, Baigui Sun

    Abstract: Recent advancement in personalized image generation have unveiled the intriguing capability of pre-trained text-to-image models on learning identity information from a collection of portrait images. However, existing solutions are vulnerable in producing truthful details, and usually suffer from several defects such as (i) The generated face exhibit its own unique characteristics, \ie facial shape… ▽ More

    Submitted 13 December, 2023; v1 submitted 27 August, 2023; originally announced August 2023.

    Comments: This is an ongoing work that will be consistently refined and improved upon

  28. arXiv:2308.12030  [pdf, other

    cs.CL cs.AI cs.LG

    Prompt-Based Length Controlled Generation with Reinforcement Learning

    Authors: Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: Large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising performance on a wide range of NLP tasks. Length controlled generation of LLMs emerges as an important topic, which enables users to fully leverage the capability of LLMs in more real-world scenarios like generating a proper answer or essay of a desired length. In addition, the autoregressive… ▽ More

    Submitted 30 September, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

  29. AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models

    Authors: Siheng Li, Cheng Yang, Yichun Yin, Xinyu Zhu, Zesen Cheng, Lifeng Shang, Xin Jiang, Qun Liu, Yujiu Yang

    Abstract: Information-seeking conversation, which aims to help users gather information through conversation, has achieved great progress in recent years. However, the research is still stymied by the scarcity of training data. To alleviate this problem, we propose AutoConv for synthetic conversation generation, which takes advantage of the few-shot learning ability and generation capacity of large language… ▽ More

    Submitted 12 August, 2023; originally announced August 2023.

    Comments: Accepted to ACL 2023 Main Conference (Short)

  30. NewsDialogues: Towards Proactive News Grounded Conversation

    Authors: Siheng Li, Yichun Yin, Cheng Yang, Wangjie Jiang, Yiwei Li, Zesen Cheng, Lifeng Shang, Xin Jiang, Qun Liu, Yujiu Yang

    Abstract: Hot news is one of the most popular topics in daily conversations. However, news grounded conversation has long been stymied by the lack of well-designed task definition and scarce data. In this paper, we propose a novel task, Proactive News Grounded Conversation, in which a dialogue system can proactively lead the conversation based on some key topics of the news. In addition, both information-se… ▽ More

    Submitted 12 August, 2023; originally announced August 2023.

    Comments: Accepted to ACL 2023 Conference (Long Paper; Findings)

  31. arXiv:2308.05872  [pdf, other

    cs.CV

    Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention

    Authors: Liang Shang, Yanli Liu, Zhengyang Lou, Shuxue Quan, Nagesh Adluru, Bochen Guan, William A. Sethares

    Abstract: Convolutional neural networks (CNNs) and vision transformers (ViTs) have achieved remarkable success in various vision tasks. However, many architectures do not consider interactions between feature maps from different stages and scales, which may limit their performance. In this work, we propose a simple add-on attention module to overcome these limitations via multi-stage and cross-scale interac… ▽ More

    Submitted 14 August, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

  32. arXiv:2307.15960  [pdf, other

    cs.IR cs.LG

    Recommendation Unlearning via Matrix Correction

    Authors: Jiahao Liu, Dongsheng Li, Hansu Gu, Tun Lu, Jiongran Wu, Peng Zhang, Li Shang, Ning Gu

    Abstract: Recommender systems are important for providing personalized services to users, but the vast amount of collected user data has raised concerns about privacy (e.g., sensitive data), security (e.g., malicious data) and utility (e.g., toxic data). To address these challenges, recommendation unlearning has emerged as a promising approach, which allows specific data and models to be forgotten, mitigati… ▽ More

    Submitted 29 July, 2023; originally announced July 2023.

    Comments: 14 pages, under review

  33. arXiv:2307.12966  [pdf, other

    cs.CL

    Aligning Large Language Models with Human: A Survey

    Authors: Yufei Wang, Wanjun Zhong, Liangyou Li, Fei Mi, Xingshan Zeng, Wenyong Huang, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect (hallucinated) information. Hence, aligning LLMs w… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: work in progress

  34. arXiv:2305.16481  [pdf, other

    cs.CV

    SimHaze: game engine simulated data for real-world dehazing

    Authors: Zhengyang Lou, Huan Xu, Fangzhou Mu, Yanli Liu, Xiaoyu Zhang, Liang Shang, Jiang Li, Bochen Guan, Yin Li, Yu Hen Hu

    Abstract: Deep models have demonstrated recent success in single-image dehazing. Most prior methods consider fully supervised training and learn from paired clean and hazy images, where a hazy image is synthesized based on a clean image and its estimated depth map. This paradigm, however, can produce low-quality hazy images due to inaccurate depth estimation, resulting in poor generalization of the trained… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: Submitted to ICIP 2023

  35. arXiv:2305.14103  [pdf, other

    cs.AI cs.HC cs.IR

    Simulating News Recommendation Ecosystem for Fun and Profit

    Authors: Guangping Zhang, Dongsheng Li, Hansu Gu, Tun Lu, Li Shang, Ning Gu

    Abstract: Understanding the evolution of online news communities is essential for designing more effective news recommender systems. However, due to the lack of appropriate datasets and platforms, the existing literature is limited in understanding the impact of recommender systems on this evolutionary process and the underlying mechanisms, resulting in sub-optimal system designs that may affect long-term u… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Post-publication copyright may be transferred with notice, after which this version may no longer be accessible

  36. arXiv:2305.12851  [pdf, other

    cs.CL cs.AI

    Enhancing Coherence of Extractive Summarization with Multitask Learning

    Authors: Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: This study proposes a multitask learning architecture for extractive summarization with coherence boosting. The architecture contains an extractive summarizer and coherent discriminator module. The coherent discriminator is trained online on the sentence vectors of the augmented textual input, thus improving its general ability of judging whether the input sentences are coherent. Meanwhile, we max… ▽ More

    Submitted 21 July, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 11 pages, 4 figures

  37. arXiv:2305.12692  [pdf, other

    cs.CL cs.AI

    MetaAdapt: Domain Adaptive Few-Shot Misinformation Detection via Meta Learning

    Authors: Zhenrui Yue, Huimin Zeng, Yang Zhang, Lanyu Shang, Dong Wang

    Abstract: With emerging topics (e.g., COVID-19) on social media as a source for the spreading misinformation, overcoming the distributional shifts between the original training domain (i.e., source domain) and such target domains remains a non-trivial task for misinformation detection. This presents an elusive challenge for early-stage misinformation detection, where a good amount of data and annotations fr… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  38. arXiv:2305.03711  [pdf, other

    cs.LG cs.CY

    Medical records condensation: a roadmap towards healthcare data democratisation

    Authors: Yujiang Wang, Anshul Thakur, Mingzhi Dong, Pingchuan Ma, Stavros Petridis, Li Shang, Tingting Zhu, David A. Clifton

    Abstract: The prevalence of artificial intelligence (AI) has envisioned an era of healthcare democratisation that promises every stakeholder a new and better way of life. However, the advancement of clinical AI research is significantly hurdled by the dearth of data democratisation in healthcare. To truly democratise data for AI studies, challenges are two-fold: 1. the sensitive information in clinical data… ▽ More

    Submitted 8 January, 2024; v1 submitted 5 May, 2023; originally announced May 2023.

  39. arXiv:2304.11528  [pdf, other

    cs.IR cs.LG

    Triple Structural Information Modelling for Accurate, Explainable and Interactive Recommendation

    Authors: Jiahao Liu, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, Li Shang, Ning Gu

    Abstract: In dynamic interaction graphs, user-item interactions usually follow heterogeneous patterns, represented by different structural information, such as user-item co-occurrence, sequential information of user interactions and the transition probabilities of item pairs. However, the existing methods cannot simultaneously leverage all three structural information, resulting in suboptimal performance. T… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

    Comments: 10 pages, Accepted by SIGIR 2023

  40. arXiv:2303.04947  [pdf, other

    cs.CV

    InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning

    Authors: Ziheng Qin, Kai Wang, Zangwei Zheng, Jianyang Gu, Xiangyu Peng, Zhaopan Xu, Daquan Zhou, Lei Shang, Baigui Sun, Xuansong Xie, Yang You

    Abstract: Data pruning aims to obtain lossless performances with less overall cost. A common approach is to filter out samples that make less contribution to the training. This could lead to gradient expectation bias compared to the original data. To solve this problem, we propose \textbf{InfoBatch}, a novel framework aiming to achieve lossless training acceleration by unbiased dynamic data pruning. Specifi… ▽ More

    Submitted 20 October, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

    Comments: The second version of InfoBatch, we extend it into SSL and LLM tasks

  41. arXiv:2302.03038  [pdf, other

    q-bio.GN cs.AI cs.LG

    Single Cells Are Spatial Tokens: Transformers for Spatial Transcriptomic Data Imputation

    Authors: Hongzhi Wen, Wenzhuo Tang, Wei Jin, Jiayuan Ding, Renming Liu, Xinnan Dai, Feng Shi, Lulu Shang, Hui Liu, Yuying Xie

    Abstract: Spatially resolved transcriptomics brings exciting breakthroughs to single-cell analysis by providing physical locations along with gene expression. However, as a cost of the extremely high spatial resolution, the cellular level spatial transcriptomic data suffer significantly from missing values. While a standard solution is to perform imputation on the missing values, most existing methods eithe… ▽ More

    Submitted 16 February, 2024; v1 submitted 5 February, 2023; originally announced February 2023.

  42. Personalized Graph Signal Processing for Collaborative Filtering

    Authors: Jiahao Liu, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, Li Shang, Ning Gu

    Abstract: The collaborative filtering (CF) problem with only user-item interaction information can be solved by graph signal processing (GSP), which uses low-pass filters to smooth the observed interaction signals on the similarity graph to obtain the prediction signals. However, the interaction signal may not be sufficient to accurately characterize user interests and the low-pass filters may ignore the us… ▽ More

    Submitted 4 February, 2023; originally announced February 2023.

    Comments: Accepted by WWW 2023, 9 pages

  43. arXiv:2212.10257  [pdf, other

    cs.CL

    Original or Translated? On the Use of Parallel Data for Translation Quality Estimation

    Authors: Baopu Qiu, Liang Ding, Di Wu, Lin Shang, Yibing Zhan, Dacheng Tao

    Abstract: Machine Translation Quality Estimation (QE) is the task of evaluating translation output in the absence of human-written references. Due to the scarcity of human-labeled QE data, previous works attempted to utilize the abundant unlabeled parallel corpora to produce additional training data with pseudo labels. In this paper, we demonstrate a significant gap between parallel data and real QE data: f… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: work in progress

  44. arXiv:2212.07699  [pdf, other

    cs.CL cs.AI cs.CV

    Retrieval-based Disentangled Representation Learning with Natural Language Supervision

    Authors: Jiawei Zhou, Xiaoguang Li, Lifeng Shang, Xin Jiang, Qun Liu, Lei Chen

    Abstract: Disentangled representation learning remains challenging as the underlying factors of variation in the data do not naturally exist. The inherent complexity of real-world data makes it unfeasible to exhaustively enumerate and encapsulate all its variations within a finite set of factors. However, it is worth noting that most real-world data have linguistic equivalents, typically in the form of text… ▽ More

    Submitted 10 February, 2024; v1 submitted 15 December, 2022; originally announced December 2022.

  45. arXiv:2212.03613  [pdf, other

    cs.CL

    G-MAP: General Memory-Augmented Pre-trained Language Model for Domain Tasks

    Authors: Zhongwei Wan, Yichun Yin, Wei Zhang, Jiaxin Shi, Lifeng Shang, Guangyong Chen, Xin Jiang, Qun Liu

    Abstract: Recently, domain-specific PLMs have been proposed to boost the task performance of specific domains (e.g., biomedical and computer science) by continuing to pre-train general PLMs with domain-specific corpora. However, this Domain-Adaptive Pre-Training (DAPT; Gururangan et al. (2020)) tends to forget the previous general knowledge acquired by general PLMs, which leads to a catastrophic forgetting… ▽ More

    Submitted 17 February, 2024; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: EMNLP 2022,Long paper,Main conference

  46. arXiv:2212.01810  [pdf, other

    cs.CL

    Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation

    Authors: Zhexin Zhang, Jiale Cheng, Hao Sun, Jiawen Deng, Fei Mi, Yasheng Wang, Lifeng Shang, Minlie Huang

    Abstract: Large pretrained language models can easily produce toxic or biased content, which is prohibitive for practical use. In order to detect such toxic generations, existing methods rely on templates, real-world data extraction, crowdsourcing workers, or automatic generation to construct adversarial contexts that are likely to induce toxic generations. However, what type of context is more likely to in… ▽ More

    Submitted 4 December, 2022; originally announced December 2022.

    Comments: Findings of EMNLP 2022

  47. arXiv:2212.01015  [pdf, other

    cs.CV cs.AI

    Improving Training and Inference of Face Recognition Models via Random Temperature Scaling

    Authors: Lei Shang, Mouxiao Huang, Wu Shi, Yuchen Liu, Yang Liu, Fei Wang, Baigui Sun, Xuansong Xie, Yu Qiao

    Abstract: Data uncertainty is commonly observed in the images for face recognition (FR). However, deep learning algorithms often make predictions with high confidence even for uncertain or irrelevant inputs. Intuitively, FR algorithms can benefit from both the estimation of uncertainty and the detection of out-of-distribution (OOD) samples. Taking a probabilistic view of the current classification model, th… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

    Comments: AAAI 2023

  48. arXiv:2211.15478  [pdf, other

    cs.LG cs.AI

    EVNet: An Explainable Deep Network for Dimension Reduction

    Authors: Zelin Zang, Shenghui Cheng, Linyan Lu, Hanchen Xia, Liangyu Li, Yaoting Sun, Yongjie Xu, Lei Shang, Baigui Sun, Stan Z. Li

    Abstract: Dimension reduction (DR) is commonly utilized to capture the intrinsic structure and transform high-dimensional data into low-dimensional space while retaining meaningful properties of the original data. It is used in various applications, such as image recognition, single-cell sequencing analysis, and biomarker discovery. However, contemporary parametric-free and parametric DR techniques suffer f… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: 18 pages, 15 figures, accepted by TVCG

  49. arXiv:2211.11262  [pdf, other

    cs.CV

    Boosting Novel Category Discovery Over Domains with Soft Contrastive Learning and All-in-One Classifier

    Authors: Zelin Zang, Lei Shang, Senqiao Yang, Fei Wang, Baigui Sun, Xuansong Xie, Stan Z. Li

    Abstract: Unsupervised domain adaptation (UDA) has proven to be highly effective in transferring knowledge from a label-rich source domain to a label-scarce target domain. However, the presence of additional novel categories in the target domain has led to the development of open-set domain adaptation (ODA) and universal domain adaptation (UNDA). Existing ODA and UNDA methods treat all novel categories as a… ▽ More

    Submitted 23 July, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted by ICCV

  50. arXiv:2211.00409  [pdf, other

    cs.CV

    Oracle-guided Contrastive Clustering

    Authors: Mengdie Wang, Liyuan Shang, Suyun Zhao, Yiming Wang, Hong Chen, Cuiping Li, Xizhao Wang

    Abstract: Deep clustering aims to learn a clustering representation through deep architectures. Most of the existing methods usually conduct clustering with the unique goal of maximizing clustering performance, that ignores the personalized demand of clustering tasks.% and results in unguided clustering solutions. However, in real scenarios, oracles may tend to cluster unlabeled data by exploiting distinct… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: 8 pages, 4 figures