Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–46 of 46 results for author: Quan, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.19807  [pdf, other

    cs.CL

    Cool-Fusion: Fuse Large Language Models without Training

    Authors: Cong Liu, Xiaojun Quan, Yan Pan, Liang Lin, Weigang Wu, Xu Chen

    Abstract: We focus on the problem of fusing two or more heterogeneous large language models (LLMs) to facilitate their complementary strengths. One of the challenges on model fusion is high computational load, i.e. to fine-tune or to align vocabularies via combinatorial optimization. To this end, we propose \emph{Cool-Fusion}, a simple yet effective approach that fuses the knowledge of heterogeneous source… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  2. arXiv:2406.19741  [pdf, other

    cs.RO cs.AI

    ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

    Authors: Christopher E. Mower, Yuhui Wan, Hongzhan Yu, Antoine Grosnit, Jonas Gonzalez-Billandon, Matthieu Zimmer, Jinlong Wang, Xinyu Zhang, Yao Zhao, Anbang Zhai, Puze Liu, Daniel Palenicek, Davide Tateo, Cesar Cadena, Marco Hutter, Jan Peters, Guangjian Tian, Yuzheng Zhuang, Kun Shao, Xingyue Quan, Jianye Hao, Jun Wang, Haitham Bou-Ammar

    Abstract: We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connect… ▽ More

    Submitted 12 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: This document contains 26 pages and 13 figures

  3. arXiv:2406.10813  [pdf, other

    cs.CL

    Self-Evolution Fine-Tuning for Policy Optimization

    Authors: Ruijun Chen, Jiehao Liang, Shiping Gao, Fanqi Wan, Xiaojun Quan

    Abstract: The alignment of large language models (LLMs) is crucial not only for unlocking their potential in specific tasks but also for ensuring that responses meet human expectations and adhere to safety and ethical principles. Current alignment methodologies face considerable challenges. For instance, supervised fine-tuning (SFT) requires extensive, high-quality annotated samples, while reinforcement lea… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  4. arXiv:2406.10594  [pdf, other

    cs.CL

    BlockPruner: Fine-grained Pruning for Large Language Models

    Authors: Longguang Zhong, Fanqi Wan, Ruijun Chen, Xiaojun Quan, Liangzhi Li

    Abstract: With the rapid growth in the size and complexity of large language models (LLMs), the costs associated with their training and inference have escalated significantly. Research indicates that certain layers in LLMs harbor substantial redundancy, and pruning these layers has minimal impact on the overall performance. While various layer pruning methods have been developed based on this insight, they… ▽ More

    Submitted 20 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  5. arXiv:2405.01379  [pdf, other

    cs.CL

    Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving

    Authors: Xin Quan, Marco Valentino, Louise A. Dennis, André Freitas

    Abstract: Natural language explanations have become a proxy for evaluating explainable and multi-step Natural Language Inference (NLI) models. However, assessing the validity of explanations for NLI is challenging as it typically involves the crowd-sourcing of apposite datasets, a process that is time-consuming and prone to logical errors. To address existing limitations, this paper investigates the verific… ▽ More

    Submitted 7 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  6. arXiv:2404.04386  [pdf, other

    cs.SD eess.AS

    "It is okay to be uncommon": Quantizing Sound Event Detection Networks on Hardware Accelerators with Uncommon Sub-Byte Support

    Authors: Yushu Wu, Xiao Quan, Mohammad Rasool Izadi, Chuan-Che Huang

    Abstract: If our noise-canceling headphones can understand our audio environments, they can then inform us of important sound events, tune equalization based on the types of content we listen to, and dynamically adjust noise cancellation parameters based on audio scenes to further reduce distraction. However, running multiple audio understanding models on headphones with a limited energy budget and on-chip… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 5 pages, 2 figures, Accepted to ICASSP 2024

  7. arXiv:2403.13679  [pdf, other

    cs.CL

    RoleInteract: Evaluating the Social Interaction of Role-Playing Agents

    Authors: Hongzhan Chen, Hehong Chen, Ming Yan, Wenshen Xu, Xing Gao, Weizhou Shen, Xiaojun Quan, Chenliang Li, Ji Zhang, Fei Huang, Jingren Zhou

    Abstract: Large language models (LLMs) have advanced the development of various AI conversational agents, including role-playing conversational agents that mimic diverse characters and human behaviors. While prior research has predominantly focused on enhancing the conversational capability, role-specific knowledge, and stylistic attributes of these agents, there has been a noticeable gap in assessing their… ▽ More

    Submitted 21 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  8. arXiv:2402.16107  [pdf, other

    cs.CL

    Knowledge Fusion of Chat LLMs: A Preliminary Technical Report

    Authors: Fanqi Wan, Ziyi Yang, Longguang Zhong, Xiaojun Quan, Xinting Huang, Wei Bi

    Abstract: Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training. In this report, we extend the scalability and flexibility of the FuseLLM framework to realize the fusion of chat LLMs, resulting in FusionChat. FusionChat comprises two main stages. Firstly, we undertake kno… ▽ More

    Submitted 28 May, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

    Comments: Technical Report, work in progress

  9. arXiv:2402.04601  [pdf, other

    cs.CL cs.AI

    Alirector: Alignment-Enhanced Chinese Grammatical Error Corrector

    Authors: Haihui Yang, Xiaojun Quan

    Abstract: Chinese grammatical error correction (CGEC) faces serious overcorrection challenges when employing autoregressive generative models such as sequence-to-sequence (Seq2Seq) models and decoder-only large language models (LLMs). While previous methods aim to address overcorrection in Seq2Seq models, they are difficult to adapt to decoder-only LLMs. In this paper, we propose an alignment-enhanced corre… ▽ More

    Submitted 2 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted to Findings of ACL 2024

  10. arXiv:2402.00745  [pdf, other

    cs.CL

    Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement

    Authors: Xin Quan, Marco Valentino, Louise A. Dennis, André Freitas

    Abstract: An increasing amount of research in Natural Language Inference (NLI) focuses on the application and evaluation of Large Language Models (LLMs) and their reasoning capabilities. Despite their success, however, LLMs are still prone to factual errors and inconsistencies in their explanations, offering limited control and interpretability for inference in complex domains. In this paper, we focus on et… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: Camera-ready for EACL 2024

  11. arXiv:2401.10768  [pdf, other

    cs.CL

    Knowledge Verification to Nip Hallucination in the Bud

    Authors: Fanqi Wan, Xinting Huang, Leyang Cui, Xiaojun Quan, Wei Bi, Shuming Shi

    Abstract: While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as \emph{hallucination}. In this paper, we demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external know… ▽ More

    Submitted 16 April, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Work in progress

  12. arXiv:2401.10491  [pdf, other

    cs.CL

    Knowledge Fusion of Large Language Models

    Authors: Fanqi Wan, Xinting Huang, Deng Cai, Xiaojun Quan, Wei Bi, Shuming Shi

    Abstract: While training large language models (LLMs) from scratch can generate models with distinct functionalities and strengths, it comes at significant costs and may result in redundant capabilities. Alternatively, a cost-effective and compelling approach is to merge existing pre-trained LLMs into a more potent model. However, due to the varying architectures of these LLMs, directly blending their weigh… ▽ More

    Submitted 22 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024

  13. arXiv:2401.07324  [pdf, other

    cs.AI cs.CL

    Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

    Authors: Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huang

    Abstract: Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs, empowering them to interact with external tools (e.g., APIs, functions) and complete various tasks in a self-directed fashion. The challenge of tool use demands that LLMs not only understand user queries and generate answers accurately but also excel in task planning, tool invocation, and result summarizati… ▽ More

    Submitted 16 February, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

    Comments: On progress, github repo: https://github.com/X-PLUG/Multi-LLM-Agent

  14. arXiv:2401.07013  [pdf, other

    cs.CL

    Knowledge Distillation for Closed-Source Language Models

    Authors: Hongzhan Chen, Xiaojun Quan, Hehong Chen, Ming Yan, Ji Zhang

    Abstract: Closed-source language models such as GPT-4 have achieved remarkable performance. Many recent studies focus on enhancing the capabilities of smaller models through knowledge distillation from closed-source language models. However, due to the incapability to directly access the weights, hidden states, and output distributions of these closed-source models, the distillation can only be performed by… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  15. arXiv:2310.20256  [pdf, other

    cs.CL

    PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for Personality Detection

    Authors: Tao Yang, Tianyuan Shi, Fanqi Wan, Xiaojun Quan, Qifan Wang, Bingzhe Wu, Jiaxiang Wu

    Abstract: Recent advances in large language models (LLMs), such as ChatGPT, have showcased remarkable zero-shot performance across various NLP tasks. However, the potential of LLMs in personality detection, which involves identifying an individual's personality from their written texts, remains largely unexplored. Drawing inspiration from Psychological Questionnaires, which are carefully designed by psychol… ▽ More

    Submitted 4 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Accepted to Findings of EMNLP 2023

  16. arXiv:2310.14747  [pdf, other

    cs.CL

    MCC-KD: Multi-CoT Consistent Knowledge Distillation

    Authors: Hongzhan Chen, Siyue Wu, Xiaojun Quan, Rui Wang, Ming Yan, Ji Zhang

    Abstract: Large language models (LLMs) have showcased remarkable capabilities in complex reasoning through chain of thought (CoT) prompting. Recently, there has been a growing interest in transferring these reasoning abilities from LLMs to smaller models. However, achieving both the diversity and consistency in rationales presents a challenge. In this paper, we focus on enhancing these two aspects and propo… ▽ More

    Submitted 20 December, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted to ENMLP 2023

  17. arXiv:2310.14528  [pdf, other

    cs.CL

    Dual-Feedback Knowledge Retrieval for Task-Oriented Dialogue Systems

    Authors: Tianyuan Shi, Liangzhi Li, Zijian Lin, Tao Yang, Xiaojun Quan, Qifan Wang

    Abstract: Efficient knowledge retrieval plays a pivotal role in ensuring the success of end-to-end task-oriented dialogue systems by facilitating the selection of relevant information necessary to fulfill user requests. However, current approaches generally integrate knowledge retrieval and response generation, which poses scalability challenges when dealing with extensive knowledge bases. Taking inspiratio… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 (Main Conference)

  18. arXiv:2310.09168  [pdf, other

    cs.CL

    Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration

    Authors: Fanqi Wan, Xinting Huang, Tao Yang, Xiaojun Quan, Wei Bi, Shuming Shi

    Abstract: Instruction-tuning can be substantially optimized through enhanced diversity, resulting in models capable of handling a broader spectrum of tasks. However, existing data employed for such tuning often exhibit an inadequate coverage of individual domains, limiting the scope for nuanced comprehension and interactions within these areas. To address this deficiency, we propose Explore-Instruct, a nove… ▽ More

    Submitted 24 October, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 (Main Conference)

  19. arXiv:2310.08877  [pdf, other

    cs.CL

    Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System

    Authors: Weizhou Shen, Yingqi Gao, Canbin Huang, Fanqi Wan, Xiaojun Quan, Wei Bi

    Abstract: Developing an efficient retriever to retrieve knowledge from a large-scale knowledge base (KB) is critical for task-oriented dialogue systems to effectively handle localized and specialized tasks. However, widely used generative models such as T5 and ChatGPT often struggle to differentiate subtle differences among the retrieved KB records when generating responses, resulting in suboptimal quality… ▽ More

    Submitted 20 October, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 Main Conference

  20. arXiv:2305.14783  [pdf, other

    cs.CL

    Disentangled Phonetic Representation for Chinese Spelling Correction

    Authors: Zihong Liang, Xiaojun Quan, Qifan Wang

    Abstract: Chinese Spelling Correction (CSC) aims to detect and correct erroneous characters in Chinese texts. Although efforts have been made to introduce phonetic information (Hanyu Pinyin) in this task, they typically merge phonetic representations with character representations, which tends to weaken the representation effect of normal texts. In this work, we propose to disentangle the two types of featu… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023 Main Conference

  21. arXiv:2305.10149  [pdf, other

    cs.CL

    Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog

    Authors: Fanqi Wan, Weizhou Shen, Ke Yang, Xiaojun Quan, Wei Bi

    Abstract: Retrieving proper domain knowledge from an external database lies at the heart of end-to-end task-oriented dialog systems to generate informative responses. Most existing systems blend knowledge retrieval with response generation and optimize them with direct supervision from reference responses, leading to suboptimal retrieval performance when the knowledge base becomes large-scale. To address th… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023 (Main Conference)

  22. arXiv:2305.10010  [pdf, other

    cs.CL

    AD-KD: Attribution-Driven Knowledge Distillation for Language Model Compression

    Authors: Siyue Wu, Hongzhan Chen, Xiaojun Quan, Qifan Wang, Rui Wang

    Abstract: Knowledge distillation has attracted a great deal of interest recently to compress pre-trained language models. However, existing knowledge distillation methods suffer from two limitations. First, the student model simply imitates the teacher's behavior while ignoring the underlying reasoning. Second, these methods usually focus on the transfer of sophisticated model-specific knowledge but overloo… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023 Main Conference

  23. arXiv:2305.09892  [pdf, other

    cs.CL cs.AI

    Clustering-Aware Negative Sampling for Unsupervised Sentence Representation

    Authors: Jinghao Deng, Fanqi Wan, Tao Yang, Xiaojun Quan, Rui Wang

    Abstract: Contrastive learning has been widely studied in sentence representation learning. However, earlier works mainly focus on the construction of positive examples, while in-batch samples are often simply treated as negative examples. This approach overlooks the importance of selecting appropriate negative examples, potentially leading to a scarcity of hard negatives and the inclusion of false negative… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: accepted to Finding of ACL2023, 16 pages

  24. arXiv:2302.10680  [pdf, other

    cs.CL

    Generic Dependency Modeling for Multi-Party Conversation

    Authors: Weizhou Shen, Xiaojun Quan, Ke Yang

    Abstract: To model the dependencies between utterances in multi-party conversations, we propose a simple and generic framework based on the dependency parsing results of utterances. Particularly, we present an approach to encoding the dependencies in the form of relative dependency encoding (ReDE) and illustrate how to implement it in Transformers by modifying the computation of self-attention. Experimental… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: Accepted to ICASSP 2023

  25. arXiv:2212.01515  [pdf, other

    cs.CL

    Orders Are Unwanted: Dynamic Deep Graph Convolutional Network for Personality Detection

    Authors: Tao Yang, Jinghao Deng, Xiaojun Quan, Qifan Wang

    Abstract: Predicting personality traits based on online posts has emerged as an important task in many fields such as social network analysis. One of the challenges of this task is assembling information from various posts into an overall profile for each user. While many previous solutions simply concatenate the posts into a long document and then encode the document by sequential or hierarchical models, t… ▽ More

    Submitted 4 April, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: AAAI2023 Camera-ready

  26. arXiv:2210.05883  [pdf, other

    cs.CL

    AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning

    Authors: Tao Yang, Jinghao Deng, Xiaojun Quan, Qifan Wang, Shaoliang Nie

    Abstract: Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout proves to be an effective antidote by randomly dropping a proportion of units, existing research has not examined its effect on the self-attention mechanism. In this paper, we investigate this problem through self-attention attribution and find… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022

  27. arXiv:2210.04457  [pdf, other

    cs.CL cs.LG

    XPrompt: Exploring the Extreme of Prompt Tuning

    Authors: Fang Ma, Chen Zhang, Lei Ren, Jingang Wang, Qifan Wang, Wei Wu, Xiaojun Quan, Dawei Song

    Abstract: Prompt tuning learns soft prompts to condition frozen Pre-trained Language Models (PLMs) for performing downstream tasks in a parameter-efficient manner. While prompt tuning has gradually reached the performance level of fine-tuning as the model scale increases, there is still a large performance gap between prompt tuning and fine-tuning for models of moderate and small scales (typically less than… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: 15 pages, accepted to EMNLP 2022 main conference

  28. arXiv:2209.08708  [pdf, other

    cs.CL

    Autoregressive Entity Generation for End-to-End Task-Oriented Dialog

    Authors: Guanhuan Huang, Xiaojun Quan, Qifan Wang

    Abstract: Task-oriented dialog (TOD) systems often require interaction with an external knowledge base to retrieve necessary entity (e.g., restaurant) information to support the response generation. Most current end-to-end TOD systems either retrieve the KB information explicitly or embed it into model parameters for implicit access.~While the former approach demands scanning the KB at each turn of response… ▽ More

    Submitted 18 September, 2022; originally announced September 2022.

    Comments: Accepted to COLING 2022

  29. arXiv:2209.07239  [pdf, other

    cs.CL

    UBARv2: Towards Mitigating Exposure Bias in Task-Oriented Dialogs

    Authors: Yunyi Yang, Hong Ding, Qingyi Liu, Xiaojun Quan

    Abstract: This paper studies the exposure bias problem in task-oriented dialog systems, where the model's generated content over multiple turns drives the dialog context away from the ground-truth distribution at training time, introducing error propagation and damaging the robustness of the TOD system. To bridge the gap between training and inference for multi-turn task-oriented dialogs, we propose session… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: 15 pages, 8 figures

  30. arXiv:2206.13974  [pdf, other

    cs.CL

    Joint Generator-Ranker Learning for Natural Language Generation

    Authors: Weizhou Shen, Yeyun Gong, Yelong Shen, Song Wang, Xiaojun Quan, Nan Duan, Weizhu Chen

    Abstract: Generate-then-rank is a widely used mechanism for text generation, where a generator produces multiple text candidates and a ranker chooses the best one among the text candidates. However, existing methods usually train the generator and the ranker individually, neglecting the mutual feedback that could further enhance the generation quality. To tackle this limitation, we propose JGR, a novel join… ▽ More

    Submitted 28 May, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

  31. GL-RG: Global-Local Representation Granularity for Video Captioning

    Authors: Liqi Yan, Qifan Wang, Yiming Cui, Fuli Feng, Xiaojun Quan, Xiangyu Zhang, Dongfang Liu

    Abstract: Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local representation across video frames for caption generation, leaving plenty of room for improvement. In this work, we approach the video captioning task from a new perspective and propose a GL-RG framework… ▽ More

    Submitted 28 February, 2023; v1 submitted 21 May, 2022; originally announced May 2022.

    Comments: Accepted to IJCAI 2022

  32. arXiv:2203.02656  [pdf, other

    cs.LG cs.SI

    Deep Partial Multiplex Network Embedding

    Authors: Qifan Wang, Yi Fang, Anirudh Ravula, Ruining He, Bin Shen, Jingang Wang, Xiaojun Quan, Dongfang Liu

    Abstract: Network embedding is an effective technique to learn the low-dimensional representations of nodes in networks. Real-world networks are usually with multiplex or having multi-view representations from different relations. Recently, there has been increasing interest in network embedding on multiplex data. However, most existing multiplex approaches assume that the data is complete in all views. But… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: Accepted to WWW 2022 GL workshop

  33. arXiv:2202.00217  [pdf, other

    cs.CL

    WebFormer: The Web-page Transformer for Structure Information Extraction

    Authors: Qifan Wang, Yi Fang, Anirudh Ravula, Fuli Feng, Xiaojun Quan, Dongfang Liu

    Abstract: Structure information extraction refers to the task of extracting structured text fields from web pages, such as extracting a product offer from a shopping page including product title, description, brand and price. It is an important research topic which has been widely studied in document understanding and web search. Recent natural language models with sequence modeling have demonstrated state-… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

    Comments: Accepted to WWW 2022

  34. arXiv:2106.04963  [pdf, other

    cs.CL

    Psycholinguistic Tripartite Graph Network for Personality Detection

    Authors: Tao Yang, Feifan Yang, Haolan Ouyang, Xiaojun Quan

    Abstract: Most of the recent work on personality detection from online posts adopts multifarious deep neural networks to represent the posts and builds predictive models in a data-driven manner, without the exploitation of psycholinguistic knowledge that may unveil the connections between one's language usage and his psychological traits. In this paper, we propose a psycholinguistic knowledge-based triparti… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: Accepted by ACL 2021

  35. arXiv:2106.02327  [pdf, other

    cs.CL

    Bi-Granularity Contrastive Learning for Post-Training in Few-Shot Scene

    Authors: Ruikun Luo, Guanhuan Huang, Xiaojun Quan

    Abstract: The major paradigm of applying a pre-trained language model to downstream tasks is to fine-tune it on labeled task data, which often suffers instability and low performance when the labeled examples are scarce.~One way to alleviate this problem is to apply post-training on unlabeled task data before fine-tuning, adapting the pre-trained model to target domains by contrastive learning that consider… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

  36. arXiv:2106.02317  [pdf, other

    cs.CL

    Retrieve & Memorize: Dialog Policy Learning with Multi-Action Memory

    Authors: Yunhao Li, Yunyi Yang, Xiaojun Quan, Jianxing Yu

    Abstract: Dialogue policy learning, a subtask that determines the content of system response generation and then the degree of task completion, is essential for task-oriented dialogue systems. However, the unbalanced distribution of system actions in dialogue datasets often causes difficulty in learning to generate desired actions and responses. In this paper, we propose a retrieve-and-memorize framework to… ▽ More

    Submitted 26 June, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: Acceptdd to ACL2021 Findings

  37. arXiv:2105.12907  [pdf, other

    cs.CL

    Directed Acyclic Graph Network for Conversational Emotion Recognition

    Authors: Weizhou Shen, Siyue Wu, Yunyi Yang, Xiaojun Quan

    Abstract: The modeling of conversational context plays a vital role in emotion recognition from conversation (ERC). In this paper, we put forward a novel idea of encoding the utterances with a directed acyclic graph (DAG) to better model the intrinsic structure within a conversation, and design a directed acyclic neural network, namely DAG-ERC, to implement this idea. In an attempt to combine the strengths… ▽ More

    Submitted 15 September, 2021; v1 submitted 26 May, 2021; originally announced May 2021.

    Comments: Accepted to ACL-IJCNLP 2021 main conference

  38. arXiv:2012.14116  [pdf, other

    cs.CL

    Syntax-Enhanced Pre-trained Model

    Authors: Zenan Xu, Daya Guo, Duyu Tang, Qinliang Su, Linjun Shou, Ming Gong, Wanjun Zhong, Xiaojun Quan, Nan Duan, Daxin Jiang

    Abstract: We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa. Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages. Such a problem would lead to the necessity of having human-annotated syntactic information, which limits the appli… ▽ More

    Submitted 29 May, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

    Comments: Accepted by ACL-IJCNLP 2021: The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing

  39. arXiv:2012.08695  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    DialogXL: All-in-One XLNet for Multi-Party Conversation Emotion Recognition

    Authors: Weizhou Shen, Junqing Chen, Xiaojun Quan, Zhixian Xie

    Abstract: This paper presents our pioneering effort for emotion recognition in conversation (ERC) with pre-trained language models. Unlike regular documents, conversational utterances appear alternately from different parties and are usually organized as hierarchical structures in previous work. Such structures are not conducive to the application of pre-trained language models such as XLNet. To address thi… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

    Comments: Accepted by AAAI 2021 main conference

  40. arXiv:2012.03539  [pdf, other

    cs.CL

    UBAR: Towards Fully End-to-End Task-Oriented Dialog Systems with GPT-2

    Authors: Yunyi Yang, Yunhao Li, Xiaojun Quan

    Abstract: This paper presents our task-oriented dialog system UBAR which models task-oriented dialogs on a dialog session level. Specifically, UBAR is acquired by fine-tuning the large pre-trained unidirectional language model GPT-2 on the sequence of the entire dialog session which is composed of user utterance, belief state, database result, system act, and system response of every dialog turn. Additional… ▽ More

    Submitted 17 March, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: Accepted by AAAI 2021

  41. arXiv:2010.14047  [pdf, other

    cs.SI

    Embedding Dynamic Attributed Networks by Modeling the Evolution Processes

    Authors: Zenan Xu, Zijing Ou, Qinliang Su, Jianxing Yu, Xiaojun Quan, Zhenkun Lin

    Abstract: Network embedding has recently emerged as a promising technique to embed nodes of a network into low-dimensional vectors. While fairly successful, most existing works focus on the embedding techniques for static networks. But in practice, there are many networks that are evolving over time and hence are dynamic, e.g., the social networks. To address this issue, a high-order spatio-temporal embeddi… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: Accepted by COLING 2020 : The 28th International Conference on Computational Linguistics

  42. arXiv:2004.14769  [pdf, other

    cs.CL

    Conditional Augmentation for Aspect Term Extraction via Masked Sequence-to-Sequence Generation

    Authors: Kun Li, Chengbo Chen, Xiaojun Quan, Qing Ling, Yan Song

    Abstract: Aspect term extraction aims to extract aspect terms from review texts as opinion targets for sentiment analysis. One of the big challenges with this task is the lack of sufficient annotated data. While data augmentation is potentially an effective technique to address the above issue, it is uncontrollable as it may change aspect words and aspect labels unexpectedly. In this paper, we formulate the… ▽ More

    Submitted 1 May, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

    Comments: To appear at ACL 2020

  43. arXiv:2004.12363  [pdf, other

    cs.CL cs.AI

    Multi-Domain Dialogue Acts and Response Co-Generation

    Authors: Kai Wang, Junfeng Tian, Rui Wang, Xiaojun Quan, Jianxing Yu

    Abstract: Generating fluent and informative responses is of critical importance for task-oriented dialogue systems. Existing pipeline approaches generally predict multiple dialogue acts first and use them to assist response generation. There are at least two shortcomings with such approaches. First, the inherent structures of multi-domain dialogue acts are neglected. Second, the semantic associations betwee… ▽ More

    Submitted 26 April, 2020; originally announced April 2020.

    Comments: To appear at ACL 2020

  44. arXiv:2004.12362  [pdf, other

    cs.CL cs.LG

    Relational Graph Attention Network for Aspect-based Sentiment Analysis

    Authors: Kai Wang, Weizhou Shen, Yunyi Yang, Xiaojun Quan, Rui Wang

    Abstract: Aspect-based sentiment analysis aims to determine the sentiment polarity towards a specific aspect in online reviews. Most recent efforts adopt attention-based neural network models to implicitly connect aspects with opinion words. However, due to the complexity of language and the existence of multiple aspects in a single sentence, these models often confuse the connections. In this paper, we add… ▽ More

    Submitted 26 April, 2020; originally announced April 2020.

    Comments: To appear at ACL 2020

  45. arXiv:1908.11057  [pdf, other

    cs.SI cs.CL cs.LG

    A Deep Neural Information Fusion Architecture for Textual Network Embeddings

    Authors: Zenan Xu, Qinliang Su, Xiaojun Quan, Weijia Zhang

    Abstract: Textual network embeddings aim to learn a low-dimensional representation for every node in the network so that both the structural and textual information from the networks can be well preserved in the representations. Traditionally, the structural and textual embeddings were learned by models that rarely take the mutual influences between them into account. In this paper, a deep neural architectu… ▽ More

    Submitted 12 August, 2021; v1 submitted 29 August, 2019; originally announced August 2019.

    Comments: To appear at EMNLP-IJCNLP 2019 (Conference on Empirical Methods in Natural Language Processing & International Joint Conference on Natural Language Processing 2019)

  46. arXiv:1906.05012  [pdf, other

    cs.CL cs.IR

    BiSET: Bi-directional Selective Encoding with Template for Abstractive Summarization

    Authors: Kai Wang, Xiaojun Quan, Rui Wang

    Abstract: The success of neural summarization models stems from the meticulous encodings of source articles. To overcome the impediments of limited and sometimes noisy training data, one promising direction is to make better use of the available training data by applying filters during summarization. In this paper, we propose a novel Bi-directional Selective Encoding with Template (BiSET) model, which lever… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

    Comments: The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)