Search | arXiv e-print repository

Improve Mathematical Reasoning in Language Models by Automated Process Supervision

Authors: Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, Abhinav Rastogi

Abstract: Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a leng… ▽ More Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a lengthy or multi-hop reasoning chain, where the intermediate outcomes are neither properly rewarded nor penalized. Process supervision addresses this limitation by assigning intermediate rewards during the reasoning process. To date, the methods used to collect process supervision data have relied on either human annotation or per-step Monte Carlo estimation, both prohibitively expensive to scale, thus hindering the broad application of this technique. In response to this challenge, we propose a novel divide-and-conquer style Monte Carlo Tree Search (MCTS) algorithm named \textit{OmegaPRM} for the efficient collection of high-quality process supervision data. This algorithm swiftly identifies the first error in the Chain of Thought (CoT) with binary search and balances the positive and negative examples, thereby ensuring both efficiency and quality. As a result, we are able to collect over 1.5 million process supervision annotations to train a Process Reward Model (PRM). Utilizing this fully automated process supervision alongside the weighted self-consistency algorithm, we have enhanced the instruction tuned Gemini Pro model's math reasoning performance, achieving a 69.4\% success rate on the MATH benchmark, a 36\% relative improvement from the 51\% base model performance. Additionally, the entire process operates without any human intervention, making our method both financially and computationally cost-effective compared to existing methods. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 18 pages, 5 figures, 1 table

arXiv:2405.16178 [pdf, other]

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Authors: Yun Zhu, Jia-Chen Gu, Caitlin Sikora, Ho Ko, Yinxiao Liu, Chu-Cheng Lin, Lei Shu, Liangchen Luo, Lei Meng, Bang Liu, Jindong Chen

Abstract: Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility by incorporating external contexts. However, the input length grows linearly in the number of retrieved documents, causing a dramatic increase in latency. In this paper, we propose a novel paradigm named Sparse RAG, which seeks to cut computation costs through sparsity. Specifically, Sparse R… ▽ More Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility by incorporating external contexts. However, the input length grows linearly in the number of retrieved documents, causing a dramatic increase in latency. In this paper, we propose a novel paradigm named Sparse RAG, which seeks to cut computation costs through sparsity. Specifically, Sparse RAG encodes retrieved documents in parallel, which eliminates latency introduced by long-range attention of retrieved documents. Then, LLMs selectively decode the output by only attending to highly relevant caches auto-regressively, which are chosen via prompting LLMs with special control tokens. It is notable that Sparse RAG combines the assessment of each individual document and the generation of the response into a single process. The designed sparse mechanism in a RAG system can facilitate the reduction of the number of documents loaded during decoding for accelerating the inference of the RAG system. Additionally, filtering out undesirable contexts enhances the model's focus on relevant context, inherently improving its generation quality. Evaluation results of two datasets show that Sparse RAG can strike an optimal balance between generation quality and computational efficiency, demonstrating its generalizability across both short- and long-form generation tasks. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.07429 [pdf, other]

JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation

Authors: Xubo Luo, Xue Wan, Yixing Gao, Yaolin Tian, Wei Zhang, Leizheng Shu

Abstract: Unmanned aerial vehicles (UAVs) visual localization in planetary aims to estimate the absolute pose of the UAV in the world coordinate system through satellite maps and images captured by on-board cameras. However, since planetary scenes often lack significant landmarks and there are modal differences between satellite maps and UAV images, the accuracy and real-time performance of UAV positioning… ▽ More Unmanned aerial vehicles (UAVs) visual localization in planetary aims to estimate the absolute pose of the UAV in the world coordinate system through satellite maps and images captured by on-board cameras. However, since planetary scenes often lack significant landmarks and there are modal differences between satellite maps and UAV images, the accuracy and real-time performance of UAV positioning will be reduced. In order to accurately determine the position of the UAV in a planetary scene in the absence of the global navigation satellite system (GNSS), this paper proposes JointLoc, which estimates the real-time UAV position in the world coordinate system by adaptively fusing the absolute 2-degree-of-freedom (2-DoF) pose and the relative 6-degree-of-freedom (6-DoF) pose. Extensive comparative experiments were conducted on a proposed planetary UAV image cross-modal localization dataset, which contains three types of typical Martian topography generated via a simulation engine as well as real Martian UAV images from the Ingenuity helicopter. JointLoc achieved a root-mean-square error of 0.237m in the trajectories of up to 1,000m, compared to 0.594m and 0.557m for ORB-SLAM2 and ORB-SLAM3 respectively. The source code will be available at https://github.com/LuoXubo/JointLoc. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 8 pages

arXiv:2403.09030 [pdf]

An AI-Driven Approach to Wind Turbine Bearing Fault Diagnosis from Acoustic Signals

Authors: Zhao Wang, Xiaomeng Li, Na Li, Longlong Shu

Abstract: This study aimed to develop a deep learning model for the classification of bearing faults in wind turbine generators from acoustic signals. A convolutional LSTM model was successfully constructed and trained by using audio data from five predefined fault types for both training and validation. To create the dataset, raw audio signal data was collected and processed in frames to capture time and f… ▽ More This study aimed to develop a deep learning model for the classification of bearing faults in wind turbine generators from acoustic signals. A convolutional LSTM model was successfully constructed and trained by using audio data from five predefined fault types for both training and validation. To create the dataset, raw audio signal data was collected and processed in frames to capture time and frequency domain information. The model exhibited outstanding accuracy on training samples and demonstrated excellent generalization ability during validation, indicating its proficiency of generalization capability. On the test samples, the model achieved remarkable classification performance, with an overall accuracy exceeding 99.5%, and a false positive rate of less than 1% for normal status. The findings of this study provide essential support for the diagnosis and maintenance of bearing faults in wind turbine generators, with the potential to enhance the reliability and efficiency of wind power generation. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2401.07382 [pdf, other]

Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation

Authors: Meng Cao, Lei Shu, Lei Yu, Yun Zhu, Nevan Wichers, Yinxiao Liu, Lei Meng

Abstract: Reinforcement learning (RL) can align language models with non-differentiable reward signals, such as human preferences. However, a major challenge arises from the sparsity of these reward signals - typically, there is only a single reward for an entire output. This sparsity of rewards can lead to inefficient and unstable learning. To address this challenge, our paper introduces an novel framework… ▽ More Reinforcement learning (RL) can align language models with non-differentiable reward signals, such as human preferences. However, a major challenge arises from the sparsity of these reward signals - typically, there is only a single reward for an entire output. This sparsity of rewards can lead to inefficient and unstable learning. To address this challenge, our paper introduces an novel framework that utilizes the critique capability of Large Language Models (LLMs) to produce intermediate-step rewards during RL training. Our method involves coupling a policy model with a critic language model, which is responsible for providing comprehensive feedback of each part of the output. This feedback is then translated into token or span-level rewards that can be used to guide the RL training process. We investigate this approach under two different settings: one where the policy model is smaller and is paired with a more powerful critic model, and another where a single language model fulfills both roles. We assess our approach on three text generation tasks: sentiment control, language model detoxification, and summarization. Experimental results show that incorporating artificial intrinsic rewards significantly improve both sample efficiency and the overall performance of the policy model, supported by both automatic and human evaluation. △ Less

Submitted 19 February, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

arXiv:2311.16344 [pdf, other]

Spatially Adaptive Cloth Regression with Implicit Neural Representations

Authors: Lei Shu, Vinicius Azevedo, Barbara Solenthaler, Markus Gross

Abstract: The accurate representation of fine-detailed cloth wrinkles poses significant challenges in computer graphics. The inherently non-uniform structure of cloth wrinkles mandates the employment of intricate discretization strategies, which are frequently characterized by high computational demands and complex methodologies. Addressing this, the research introduced in this paper elucidates a novel anis… ▽ More The accurate representation of fine-detailed cloth wrinkles poses significant challenges in computer graphics. The inherently non-uniform structure of cloth wrinkles mandates the employment of intricate discretization strategies, which are frequently characterized by high computational demands and complex methodologies. Addressing this, the research introduced in this paper elucidates a novel anisotropic cloth regression technique that capitalizes on the potential of implicit neural representations of surfaces. Our first core contribution is an innovative mesh-free sampling approach, crafted to reduce the reliance on traditional mesh structures, thereby offering greater flexibility and accuracy in capturing fine cloth details. Our second contribution is a novel adversarial training scheme, which is designed meticulously to strike a harmonious balance between the sampling and simulation objectives. The adversarial approach ensures that the wrinkles are represented with high fidelity, while also maintaining computational efficiency. Our results showcase through various cloth-object interaction scenarios that our method, given the same memory constraints, consistently surpasses traditional discrete representations, particularly when modelling highly-detailed localized wrinkles. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 16 pages, 13 figures

MSC Class: 68T07 ACM Class: I.3.0

arXiv:2311.09204 [pdf, other]

Fusion-Eval: Integrating Assistant Evaluators with LLMs

Authors: Lei Shu, Nevan Wichers, Liangchen Luo, Yun Zhu, Yinxiao Liu, Jindong Chen, Lei Meng

Abstract: Evaluating natural language systems poses significant challenges, particularly in the realms of natural language understanding and high-level reasoning. In this paper, we introduce 'Fusion-Eval', an innovative approach that leverages Large Language Models (LLMs) to integrate insights from various assistant evaluators. The LLM is given the example to evaluate along with scores from the assistant ev… ▽ More Evaluating natural language systems poses significant challenges, particularly in the realms of natural language understanding and high-level reasoning. In this paper, we introduce 'Fusion-Eval', an innovative approach that leverages Large Language Models (LLMs) to integrate insights from various assistant evaluators. The LLM is given the example to evaluate along with scores from the assistant evaluators. Each of these evaluators specializes in assessing distinct aspects of responses. Fusion-Eval achieves a 0.962 system-level Kendall-Tau correlation with humans on SummEval and a 0.744 turn-level Spearman correlation on TopicalChat, which is significantly higher than baseline methods. These results highlight Fusion-Eval's significant potential in the realm of natural language system evaluation. △ Less

Submitted 6 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.09179 [pdf, other]

SiRA: Sparse Mixture of Low Rank Adaptation

Authors: Yun Zhu, Nevan Wichers, Chu-Cheng Lin, Xinyi Wang, Tianlong Chen, Lei Shu, Han Lu, Canoee Liu, Liangchen Luo, Jindong Chen, Lei Meng

Abstract: Parameter Efficient Tuning has been an prominent approach to adapt the Large Language Model to downstream tasks. Most previous works considers adding the dense trainable parameters, where all parameters are used to adapt certain task. We found this less effective empirically using the example of LoRA that introducing more trainable parameters does not help. Motivated by this we investigate the imp… ▽ More Parameter Efficient Tuning has been an prominent approach to adapt the Large Language Model to downstream tasks. Most previous works considers adding the dense trainable parameters, where all parameters are used to adapt certain task. We found this less effective empirically using the example of LoRA that introducing more trainable parameters does not help. Motivated by this we investigate the importance of leveraging "sparse" computation and propose SiRA: sparse mixture of low rank adaption. SiRA leverages the Sparse Mixture of Expert(SMoE) to boost the performance of LoRA. Specifically it enforces the top $k$ experts routing with a capacity limit restricting the maximum number of tokens each expert can process. We propose a novel and simple expert dropout on top of gating network to reduce the over-fitting issue. Through extensive experiments, we verify SiRA performs better than LoRA and other mixture of expert approaches across different single tasks and multitask settings. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2310.04815 [pdf, other]

Critique Ability of Large Language Models

Authors: Liangchen Luo, Zi Lin, Yinxiao Liu, Lei Shu, Yun Zhu, Jingbo Shang, Lei Meng

Abstract: Critical thinking is essential for rational decision-making and problem-solving. This skill hinges on the ability to provide precise and reasoned critiques and is a hallmark of human intelligence. In the era of large language models (LLMs), this study explores the ability of LLMs to deliver accurate critiques across various tasks. We are interested in this topic as a capable critic model could not… ▽ More Critical thinking is essential for rational decision-making and problem-solving. This skill hinges on the ability to provide precise and reasoned critiques and is a hallmark of human intelligence. In the era of large language models (LLMs), this study explores the ability of LLMs to deliver accurate critiques across various tasks. We are interested in this topic as a capable critic model could not only serve as a reliable evaluator, but also as a source of supervised signals for model tuning. Particularly, if a model can self-critique, it has the potential for autonomous self-improvement. To examine this, we introduce a unified evaluation framework for assessing the critique abilities of LLMs. We develop a benchmark called CriticBench, which comprises 3K high-quality natural language queries and corresponding model responses; and annotate the correctness of these responses. The benchmark cover tasks such as math problem-solving, code completion, and question answering. We evaluate multiple LLMs on the collected dataset and our analysis reveals several noteworthy insights: (1) Critique is generally challenging for most LLMs, and this capability often emerges only when models are sufficiently large. (2) In particular, self-critique is especially difficult. Even top-performing LLMs struggle to achieve satisfactory performance. (3) Models tend to have lower critique accuracy on problems where they are most uncertain. To this end, we introduce a simple yet effective baseline named self-check, which leverages self-critique to improve task performance for various models. We hope this study serves as an initial exploration into understanding the critique abilities of LLMs, and aims to inform future research, including the development of more proficient critic models and the application of critiques across diverse tasks. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2308.11807 [pdf, other]

Towards an On-device Agent for Text Rewriting

Authors: Yun Zhu, Yinxiao Liu, Felix Stahlberg, Shankar Kumar, Yu-hui Chen, Liangchen Luo, Lei Shu, Renjie Liu, Jindong Chen, Lei Meng

Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities for text rewriting. Nonetheless, the large sizes of these models make them impractical for on-device inference, which would otherwise allow for enhanced privacy and economical inference. Creating a smaller yet potent language model for text rewriting presents a formidable challenge because it requires balancing the need for a s… ▽ More Large Language Models (LLMs) have demonstrated impressive capabilities for text rewriting. Nonetheless, the large sizes of these models make them impractical for on-device inference, which would otherwise allow for enhanced privacy and economical inference. Creating a smaller yet potent language model for text rewriting presents a formidable challenge because it requires balancing the need for a small size with the need to retain the emergent capabilities of the LLM, that requires costly data collection. To address the above challenge, we introduce a new instruction tuning approach for building a mobile-centric text rewriting model. Our strategies enable the generation of high quality training data without any human labeling. In addition, we propose a heuristic reinforcement learning framework which substantially enhances performance without requiring preference data. To further bridge the performance gap with the larger server-side model, we propose an effective approach that combines the mobile rewrite agent with the server model using a cascade. To tailor the text rewriting tasks to mobile scenarios, we introduce MessageRewriteEval, a benchmark that focuses on text rewriting for messages through natural language instructions. Through empirical experiments, we demonstrate that our on-device model surpasses the current state-of-the-art LLMs in text rewriting while maintaining a significantly reduced model size. Notably, we show that our proposed cascading approach improves model performance. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2305.15685 [pdf, other]

RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting

Authors: Lei Shu, Liangchen Luo, Jayakumar Hoskere, Yun Zhu, Yinxiao Liu, Simon Tong, Jindong Chen, Lei Meng

Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in creative tasks such as storytelling and E-mail generation. However, as LLMs are primarily trained on final text results rather than intermediate revisions, it might be challenging for them to perform text rewriting tasks. Most studies in the rewriting tasks focus on a particular transformation type within the boundaries of s… ▽ More Large Language Models (LLMs) have demonstrated impressive capabilities in creative tasks such as storytelling and E-mail generation. However, as LLMs are primarily trained on final text results rather than intermediate revisions, it might be challenging for them to perform text rewriting tasks. Most studies in the rewriting tasks focus on a particular transformation type within the boundaries of single sentences. In this work, we develop new strategies for instruction tuning and reinforcement learning to better align LLMs for cross-sentence rewriting tasks using diverse wording and structures expressed through natural languages including 1) generating rewriting instruction data from Wiki edits and public corpus through instruction generation and chain-of-thought prompting; 2) collecting comparison data for reward model training through a new ranking function. To facilitate this research, we introduce OpenRewriteEval, a novel benchmark covers a wide variety of rewriting types expressed through natural language instructions. Our results show significant improvements over a variety of baselines. The public repository is available on GitHub under Google Research (https://github.com/google-research/google-research/tree/master/rewritelm). △ Less

Submitted 19 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Journal ref: AAAI 2024

arXiv:2304.11658 [pdf, other]

Capturing Fine-grained Semantics in Contrastive Graph Representation Learning

Authors: Lin Shu, Chuan Chen, Zibin Zheng

Abstract: Graph contrastive learning defines a contrastive task to pull similar instances close and push dissimilar instances away. It learns discriminative node embeddings without supervised labels, which has aroused increasing attention in the past few years. Nevertheless, existing methods of graph contrastive learning ignore the differences between diverse semantics existed in graphs, which learn coarse-… ▽ More Graph contrastive learning defines a contrastive task to pull similar instances close and push dissimilar instances away. It learns discriminative node embeddings without supervised labels, which has aroused increasing attention in the past few years. Nevertheless, existing methods of graph contrastive learning ignore the differences between diverse semantics existed in graphs, which learn coarse-grained node embeddings and lead to sub-optimal performances on downstream tasks. To bridge this gap, we propose a novel Fine-grained Semantics enhanced Graph Contrastive Learning (FSGCL) in this paper. Concretely, FSGCL first introduces a motif-based graph construction, which employs graph motifs to extract diverse semantics existed in graphs from the perspective of input data. Then, the semantic-level contrastive task is explored to further enhance the utilization of fine-grained semantics from the perspective of model training. Experiments on five real-world datasets demonstrate the superiority of our proposed FSGCL over state-of-the-art methods. To make the results reproducible, we will make our codes public on GitHub after this paper is accepted. △ Less

Submitted 23 April, 2023; originally announced April 2023.

arXiv:2301.08986 [pdf, other]

Adapting a Language Model While Preserving its General Knowledge

Authors: Zixuan Ke, Yijia Shao, Haowei Lin, Hu Xu, Lei Shu, Bing Liu

Abstract: Domain-adaptive pre-training (or DA-training for short), also known as post-training, aims to train a pre-trained general-purpose language model (LM) using an unlabeled corpus of a particular domain to adapt the LM so that end-tasks in the domain can give improved performances. However, existing DA-training methods are in some sense blind as they do not explicitly identify what knowledge in the LM… ▽ More Domain-adaptive pre-training (or DA-training for short), also known as post-training, aims to train a pre-trained general-purpose language model (LM) using an unlabeled corpus of a particular domain to adapt the LM so that end-tasks in the domain can give improved performances. However, existing DA-training methods are in some sense blind as they do not explicitly identify what knowledge in the LM should be preserved and what should be changed by the domain corpus. This paper shows that the existing methods are suboptimal and proposes a novel method to perform a more informed adaptation of the knowledge in the LM by (1) soft-masking the attention heads based on their importance to best preserve the general knowledge in the LM and (2) contrasting the representations of the general and the full (both general and domain knowledge) to learn an integrated representation with both general and domain-specific knowledge. Experimental results will demonstrate the effectiveness of the proposed approach. △ Less

Submitted 21 January, 2023; originally announced January 2023.

Comments: EMNLP 2022

arXiv:2210.05549 [pdf, other]

Continual Training of Language Models for Few-Shot Learning

Authors: Zixuan Ke, Haowei Lin, Yijia Shao, Hu Xu, Lei Shu, Bing Liu

Abstract: Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications. Adapting or posttraining an LM using an unlabeled domain corpus can produce even better performance for end-tasks in the domain. This paper proposes the problem of continually extending an LM by incrementally post-train the LM with a sequence of unlabeled domain corpora to expand its knowl… ▽ More Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications. Adapting or posttraining an LM using an unlabeled domain corpus can produce even better performance for end-tasks in the domain. This paper proposes the problem of continually extending an LM by incrementally post-train the LM with a sequence of unlabeled domain corpora to expand its knowledge without forgetting its previous skills. The goal is to improve the few-shot end-task learning in these domains. The resulting system is called CPT (Continual PostTraining), which to our knowledge, is the first continual post-training system. Experimental results verify its effectiveness. △ Less

Submitted 11 October, 2022; originally announced October 2022.

Journal ref: EMNLP 2022

arXiv:2208.13685 [pdf, other]

FedEgo: Privacy-preserving Personalized Federated Graph Learning with Ego-graphs

Authors: Taolin Zhang, Chuan Chen, Yaomin Chang, Lin Shu, Zibin Zheng

Abstract: As special information carriers containing both structure and feature information, graphs are widely used in graph mining, e.g., Graph Neural Networks (GNNs). However, in some practical scenarios, graph data are stored separately in multiple distributed parties, which may not be directly shared due to conflicts of interest. Hence, federated graph neural networks are proposed to address such data s… ▽ More As special information carriers containing both structure and feature information, graphs are widely used in graph mining, e.g., Graph Neural Networks (GNNs). However, in some practical scenarios, graph data are stored separately in multiple distributed parties, which may not be directly shared due to conflicts of interest. Hence, federated graph neural networks are proposed to address such data silo problems while preserving the privacy of each party (or client). Nevertheless, different graph data distributions among various parties, which is known as the statistical heterogeneity, may degrade the performance of naive federated learning algorithms like FedAvg. In this paper, we propose FedEgo, a federated graph learning framework based on ego-graphs to tackle the challenges above, where each client will train their local models while also contributing to the training of a global model. FedEgo applies GraphSAGE over ego-graphs to make full use of the structure information and utilizes Mixup for privacy concerns. To deal with the statistical heterogeneity, we integrate personalization into learning and propose an adaptive mixing coefficient strategy that enables clients to achieve their optimal personalization. Extensive experimental results and in-depth analysis demonstrate the effectiveness of FedEgo. △ Less

Submitted 9 September, 2022; v1 submitted 29 August, 2022; originally announced August 2022.

Comments: 25 pages, submitted to ACM Transactions on Knowledge Discovery from Data (TKDD)

arXiv:2203.13238 [pdf, other]

Open-set Recognition via Augmentation-based Similarity Learning

Authors: Sepideh Esmaeilpour, Lei Shu, Bing Liu

Abstract: The primary assumption of conventional supervised learning or classification is that the test samples are drawn from the same distribution as the training samples, which is called closed set learning or classification. In many practical scenarios, this is not the case because there are unknowns or unseen class samples in the test data, which is called the open set scenario, and the unknowns need t… ▽ More The primary assumption of conventional supervised learning or classification is that the test samples are drawn from the same distribution as the training samples, which is called closed set learning or classification. In many practical scenarios, this is not the case because there are unknowns or unseen class samples in the test data, which is called the open set scenario, and the unknowns need to be detected. This problem is referred to as the open set recognition problem and is important in safety-critical applications. We propose to detect unknowns (or unseen class samples) through learning pairwise similarities. The proposed method works in two steps. It first learns a closed set classifier using the seen classes that have appeared in training and then learns how to compare seen classes with pseudo-unseen (automatically generated unseen class samples). The pseudo-unseen generation is carried out by performing distribution shifting augmentations on the seen or training samples. We call our method OPG (Open set recognition based on Pseudo unseen data Generation). The experimental evaluation shows that the learned similarity-based features can successfully distinguish seen from unseen in benchmark datasets for open set recognition. △ Less

Submitted 21 August, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

arXiv:2202.02976 [pdf, other]

Measuring and Reducing Model Update Regression in Structured Prediction for NLP

Authors: Deng Cai, Elman Mansimov, Yi-An Lai, Yixuan Su, Lei Shu, Yi Zhang

Abstract: Recent advance in deep learning has led to the rapid adoption of machine learning-based NLP models in a wide range of applications. Despite the continuous gain in accuracy, backward compatibility is also an important aspect for industrial applications, yet it received little research attention. Backward compatibility requires that the new model does not regress on cases that were correctly handled… ▽ More Recent advance in deep learning has led to the rapid adoption of machine learning-based NLP models in a wide range of applications. Despite the continuous gain in accuracy, backward compatibility is also an important aspect for industrial applications, yet it received little research attention. Backward compatibility requires that the new model does not regress on cases that were correctly handled by its predecessor. This work studies model update regression in structured prediction tasks. We choose syntactic dependency parsing and conversational semantic parsing as representative examples of structured prediction tasks in NLP. First, we measure and analyze model update regression in different model update settings. Next, we explore and benchmark existing techniques for reducing model update regression including model ensemble and knowledge distillation. We further propose a simple and effective method, Backward-Congruent Re-ranking (BCR), by taking into account the characteristics of structured prediction. Experiments show that BCR can better mitigate model update regression than model ensemble and knowledge distillation approaches. △ Less

Submitted 8 October, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: NeurIPS2022

arXiv:2202.01924 [pdf, other]

Zero-Shot Aspect-Based Sentiment Analysis

Authors: Lei Shu, Hu Xu, Bing Liu, Jiahua Chen

Abstract: Aspect-based sentiment analysis (ABSA) typically requires in-domain annotated data for supervised training/fine-tuning. It is a big challenge to scale ABSA to a large number of new domains. This paper aims to train a unified model that can perform zero-shot ABSA without using any annotated data for a new domain. We propose a method called contrastive post-training on review Natural Language Infere… ▽ More Aspect-based sentiment analysis (ABSA) typically requires in-domain annotated data for supervised training/fine-tuning. It is a big challenge to scale ABSA to a large number of new domains. This paper aims to train a unified model that can perform zero-shot ABSA without using any annotated data for a new domain. We propose a method called contrastive post-training on review Natural Language Inference (CORN). Later ABSA tasks can be cast into NLI for zero-shot transfer. We evaluate CORN on ABSA tasks, ranging from aspect extraction (AE), aspect sentiment classification (ASC), to end-to-end aspect-based sentiment analysis (E2E ABSA), which show ABSA can be conducted without any human annotated ABSA data. △ Less

Submitted 14 February, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

arXiv:2112.10021 [pdf, other]

Continual Learning with Knowledge Transfer for Sentiment Classification

Authors: Zixuan Ke, Bing Liu, Hao Wang, Lei Shu

Abstract: This paper studies continual learning (CL) for sentiment classification (SC). In this setting, the CL system learns a sequence of SC tasks incrementally in a neural network, where each task builds a classifier to classify the sentiment of reviews of a particular product category or domain. Two natural questions are: Can the system transfer the knowledge learned in the past from the previous tasks… ▽ More This paper studies continual learning (CL) for sentiment classification (SC). In this setting, the CL system learns a sequence of SC tasks incrementally in a neural network, where each task builds a classifier to classify the sentiment of reviews of a particular product category or domain. Two natural questions are: Can the system transfer the knowledge learned in the past from the previous tasks to the new task to help it learn a better model for the new task? And, can old models for previous tasks be improved in the process as well? This paper proposes a novel technique called KAN to achieve these objectives. KAN can markedly improve the SC accuracy of both the new task and the old tasks via forward and backward knowledge transfer. The effectiveness of KAN is demonstrated through extensive experiments. △ Less

Submitted 18 December, 2021; originally announced December 2021.

Journal ref: ECML-PKDD 2020

arXiv:2112.02714 [pdf, other]

CLASSIC: Continual and Contrastive Learning of Aspect Sentiment Classification Tasks

Authors: Zixuan Ke, Bing Liu, Hu Xu, Lei Shu

Abstract: This paper studies continual learning (CL) of a sequence of aspect sentiment classification(ASC) tasks in a particular CL setting called domain incremental learning (DIL). Each task is from a different domain or product. The DIL setting is particularly suited to ASC because in testing the system needs not know the task/domain to which the test data belongs. To our knowledge, this setting has not b… ▽ More This paper studies continual learning (CL) of a sequence of aspect sentiment classification(ASC) tasks in a particular CL setting called domain incremental learning (DIL). Each task is from a different domain or product. The DIL setting is particularly suited to ASC because in testing the system needs not know the task/domain to which the test data belongs. To our knowledge, this setting has not been studied before for ASC. This paper proposes a novel model called CLASSIC. The key novelty is a contrastive continual learning method that enables both knowledge transfer across tasks and knowledge distillation from old tasks to the new task, which eliminates the need for task ids in testing. Experimental results show the high effectiveness of CLASSIC. △ Less

Submitted 5 December, 2021; originally announced December 2021.

Journal ref: EMNLP 2021

arXiv:2112.02706 [pdf, other]

Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Authors: Zixuan Ke, Bing Liu, Nianzu Ma, Hu Xu, Lei Shu

Abstract: Continual learning (CL) learns a sequence of tasks incrementally with the goal of achieving two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge transfer (KT) across tasks. However, most existing techniques focus only on overcoming CF and have no mechanism to encourage KT, and thus do not do well in KT. Although several papers have tried to deal with both CF and K… ▽ More Continual learning (CL) learns a sequence of tasks incrementally with the goal of achieving two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge transfer (KT) across tasks. However, most existing techniques focus only on overcoming CF and have no mechanism to encourage KT, and thus do not do well in KT. Although several papers have tried to deal with both CF and KT, our experiments show that they suffer from serious CF when the tasks do not have much shared knowledge. Another observation is that most current CL methods do not use pre-trained models, but it has been shown that such models can significantly improve the end task performance. For example, in natural language processing, fine-tuning a BERT-like pre-trained language model is one of the most effective approaches. However, for CL, this approach suffers from serious CF. An interesting question is how to make the best use of pre-trained models for CL. This paper proposes a novel model called CTR to solve these problems. Our experimental results demonstrate the effectiveness of CTR △ Less

Submitted 5 December, 2021; originally announced December 2021.

Journal ref: NeurIPS 2021

arXiv:2111.04198 [pdf, other]

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Authors: Yixuan Su, Fangyu Liu, Zaiqiao Meng, Tian Lan, Lei Shu, Ehsan Shareghi, Nigel Collier

Abstract: Masked language models (MLMs) such as BERT and RoBERTa have revolutionized the field of Natural Language Understanding in the past few years. However, existing pre-trained MLMs often output an anisotropic distribution of token representations that occupies a narrow subset of the entire representation space. Such token representations are not ideal, especially for tasks that demand discriminative s… ▽ More Masked language models (MLMs) such as BERT and RoBERTa have revolutionized the field of Natural Language Understanding in the past few years. However, existing pre-trained MLMs often output an anisotropic distribution of token representations that occupies a narrow subset of the entire representation space. Such token representations are not ideal, especially for tasks that demand discriminative semantic meanings of distinct tokens. In this work, we propose TaCL (Token-aware Contrastive Learning), a novel continual pre-training approach that encourages BERT to learn an isotropic and discriminative distribution of token representations. TaCL is fully unsupervised and requires no additional data. We extensively test our approach on a wide range of English and Chinese benchmarks. The results show that TaCL brings consistent and notable improvements over the original BERT model. Furthermore, we conduct detailed analysis to reveal the merits and inner-workings of our approach. △ Less

Submitted 28 April, 2022; v1 submitted 7 November, 2021; originally announced November 2021.

Comments: Camera-ready for NAACL 2022

arXiv:2111.02724 [pdf]

Tea Chrysanthemum Detection under Unstructured Environments Using the TC-YOLO Model

Authors: Chao Qi, Junfeng Gao, Simon Pearson, Helen Harman, Kunjie Chen, Lei Shu

Abstract: Tea chrysanthemum detection at its flowering stage is one of the key components for selective chrysanthemum harvesting robot development. However, it is a challenge to detect flowering chrysanthemums under unstructured field environments given the variations on illumination, occlusion and object scale. In this context, we propose a highly fused and lightweight deep learning architecture based on Y… ▽ More Tea chrysanthemum detection at its flowering stage is one of the key components for selective chrysanthemum harvesting robot development. However, it is a challenge to detect flowering chrysanthemums under unstructured field environments given the variations on illumination, occlusion and object scale. In this context, we propose a highly fused and lightweight deep learning architecture based on YOLO for tea chrysanthemum detection (TC-YOLO). First, in the backbone component and neck component, the method uses the Cross-Stage Partially Dense Network (CSPDenseNet) as the main network, and embeds custom feature fusion modules to guide the gradient flow. In the final head component, the method combines the recursive feature pyramid (RFP) multiscale fusion reflow structure and the Atrous Spatial Pyramid Pool (ASPP) module with cavity convolution to achieve the detection task. The resulting model was tested on 300 field images, showing that under the NVIDIA Tesla P100 GPU environment, if the inference speed is 47.23 FPS for each image (416 * 416), TC-YOLO can achieve the average precision (AP) of 92.49% on our own tea chrysanthemum dataset. In addition, this method (13.6M) can be deployed on a single mobile GPU, and it could be further developed as a perception system for a selective chrysanthemum harvesting robot in the future. △ Less

Submitted 4 November, 2021; originally announced November 2021.

arXiv:2109.14739 [pdf, other]

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Authors: Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai, Yi Zhang

Abstract: Pre-trained language models have been recently shown to benefit task-oriented dialogue (TOD) systems. Despite their success, existing methods often formulate this task as a cascaded generation problem which can lead to error accumulation across different sub-tasks and greater data annotation overhead. In this study, we present PPTOD, a unified plug-and-play model for task-oriented dialogue. In add… ▽ More Pre-trained language models have been recently shown to benefit task-oriented dialogue (TOD) systems. Despite their success, existing methods often formulate this task as a cascaded generation problem which can lead to error accumulation across different sub-tasks and greater data annotation overhead. In this study, we present PPTOD, a unified plug-and-play model for task-oriented dialogue. In addition, we introduce a new dialogue multi-task pre-training strategy that allows the model to learn the primary TOD task completion skills from heterogeneous dialog corpora. We extensively test our model on three benchmark TOD tasks, including end-to-end dialogue modelling, dialogue state tracking, and intent classification. Experimental results show that PPTOD achieves new state of the art on all evaluated tasks in both high-resource and low-resource scenarios. Furthermore, comparisons against previous SOTA methods show that the responses generated by PPTOD are more factually correct and semantically coherent as judged by human annotators. △ Less

Submitted 1 March, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: Camera-ready for ACL2022 main conference

arXiv:2109.02748 [pdf, other]

doi 10.1609/aaai.v36i6.20610

Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP

Authors: Sepideh Esmaeilpour, Bing Liu, Eric Robertson, Lei Shu

Abstract: In an out-of-distribution (OOD) detection problem, samples of known classes(also called in-distribution classes) are used to train a special classifier. In testing, the classifier can (1) classify the test samples of known classes to their respective classes and also (2) detect samples that do not belong to any of the known classes (i.e., they belong to some unknown or OOD classes). This paper stu… ▽ More In an out-of-distribution (OOD) detection problem, samples of known classes(also called in-distribution classes) are used to train a special classifier. In testing, the classifier can (1) classify the test samples of known classes to their respective classes and also (2) detect samples that do not belong to any of the known classes (i.e., they belong to some unknown or OOD classes). This paper studies the problem of zero-shot out-of-distribution(OOD) detection, which still performs the same two tasks in testing but has no training except using the given known class names. This paper proposes a novel yet simple method (called ZOC) to solve the problem. ZOC builds on top of the recent advances in zero-shot classification through multi-modal representation learning. It first extends the pre-trained language-vision model CLIP by training a text-based image description generator on top of CLIP. In testing, it uses the extended model to generate candidate unknown class names for each test sample and computes a confidence score based on both the known class names and candidate unknown class names for zero-shot OOD detection. Experimental results on 5 benchmark datasets for OOD detection demonstrate that ZOC outperforms the baselines by a large margin. △ Less

Submitted 22 March, 2022; v1 submitted 6 September, 2021; originally announced September 2021.

arXiv:2011.00169 [pdf, other]

Understanding Pre-trained BERT for Aspect-based Sentiment Analysis

Authors: Hu Xu, Lei Shu, Philip S. Yu, Bing Liu

Abstract: This paper analyzes the pre-trained hidden representations learned from reviews on BERT for tasks in aspect-based sentiment analysis (ABSA). Our work is motivated by the recent progress in BERT-based language models for ABSA. However, it is not clear how the general proxy task of (masked) language model trained on unlabeled corpus without annotations of aspects or opinions can provide important fe… ▽ More This paper analyzes the pre-trained hidden representations learned from reviews on BERT for tasks in aspect-based sentiment analysis (ABSA). Our work is motivated by the recent progress in BERT-based language models for ABSA. However, it is not clear how the general proxy task of (masked) language model trained on unlabeled corpus without annotations of aspects or opinions can provide important features for downstream tasks in ABSA. By leveraging the annotated datasets in ABSA, we investigate both the attentions and the learned representations of BERT pre-trained on reviews. We found that BERT uses very few self-attention heads to encode context words (such as prepositions or pronouns that indicating an aspect) and opinion words for an aspect. Most features in the representation of an aspect are dedicated to the fine-grained semantics of the domain (or product category) and the aspect itself, instead of carrying summarized opinions from its context. We hope this investigation can help future research in improving self-supervised learning, unsupervised learning and fine-tuning for ABSA. The pre-trained model and code can be found at https://github.com/howardhsu/BERT-for-RRC-ABSA. △ Less

Submitted 30 October, 2020; originally announced November 2020.

Comments: COLING 2020

arXiv:2009.12046 [pdf, other]

Controllable Text Generation with Focused Variation

Authors: Lei Shu, Alexandros Papangelis, Yi-Chia Wang, Gokhan Tur, Hu Xu, Zhaleh Feizollahi, Bing Liu, Piero Molino

Abstract: This work introduces Focused-Variation Network (FVN), a novel model to control language generation. The main problems in previous controlled language generation models range from the difficulty of generating text according to the given attributes, to the lack of diversity of the generated texts. FVN addresses these issues by learning disjoint discrete latent spaces for each attribute inside codebo… ▽ More This work introduces Focused-Variation Network (FVN), a novel model to control language generation. The main problems in previous controlled language generation models range from the difficulty of generating text according to the given attributes, to the lack of diversity of the generated texts. FVN addresses these issues by learning disjoint discrete latent spaces for each attribute inside codebooks, which allows for both controllability and diversity, while at the same time generating fluent text. We evaluate FVN on two text generation datasets with annotated content and style, and show state-of-the-art performance as assessed by automatic and human evaluations. △ Less

Submitted 25 September, 2020; originally announced September 2020.

arXiv:2004.13816 [pdf, other]

DomBERT: Domain-oriented Language Model for Aspect-based Sentiment Analysis

Authors: Hu Xu, Bing Liu, Lei Shu, Philip S. Yu

Abstract: This paper focuses on learning domain-oriented language models driven by end tasks, which aims to combine the worlds of both general-purpose language models (such as ELMo and BERT) and domain-specific language understanding. We propose DomBERT, an extension of BERT to learn from both in-domain corpus and relevant domain corpora. This helps in learning domain language models with low-resources. Exp… ▽ More This paper focuses on learning domain-oriented language models driven by end tasks, which aims to combine the worlds of both general-purpose language models (such as ELMo and BERT) and domain-specific language understanding. We propose DomBERT, an extension of BERT to learn from both in-domain corpus and relevant domain corpora. This helps in learning domain language models with low-resources. Experiments are conducted on an assortment of tasks in aspect-based sentiment analysis, demonstrating promising results. △ Less

Submitted 28 April, 2020; originally announced April 2020.

arXiv:1912.07794 [pdf, other]

doi 10.1109/COMST.2020.3004197

Towards Smart Wireless Communications via Intelligent Reflecting Surfaces: A Contemporary Survey

Authors: Shimin Gong, Xiao Lu, Dinh Thai Hoang, Dusit Niyato, Lei Shu, Dong In Kim, Ying-Chang Liang

Abstract: This paper presents a literature review on recent applications and design aspects of the intelligent reflecting surface (IRS) in the future wireless networks. Conventionally, the network optimization has been limited to transmission control at two endpoints, i.e., end users and network controller. The fading wireless channel is uncontrollable and becomes one of the main limiting factors for perfor… ▽ More This paper presents a literature review on recent applications and design aspects of the intelligent reflecting surface (IRS) in the future wireless networks. Conventionally, the network optimization has been limited to transmission control at two endpoints, i.e., end users and network controller. The fading wireless channel is uncontrollable and becomes one of the main limiting factors for performance improvement. The IRS is composed of a large array of scattering elements, which can be individually configured to generate additional phase shifts to the signal reflections. Hence, it can actively control the signal propagation properties in favor of signal reception, and thus realize the notion of a smart radio environment. As such, the IRS's phase control, combined with the conventional transmission control, can potentially bring performance gain compared to wireless networks without IRS. In this survey, we first introduce basic concepts of the IRS and the realizations of its reconfigurability. Then, we focus on applications of the IRS in wireless communications. We overview different performance metrics and analytical approaches to characterize the performance improvement of IRS-assisted wireless networks. To exploit the performance gain, we discuss the joint optimization of the IRS's phase control and the transceivers' transmission control in different network design problems, e.g.,~rate maximization and power minimization problems. Furthermore, we extend the discussion of IRS-assisted wireless networks to some emerging use cases. Finally, we highlight important practical challenges and future research directions for realizing IRS-assisted wireless networks in beyond 5G communications. △ Less

Submitted 19 May, 2020; v1 submitted 16 December, 2019; originally announced December 2019.

Comments: 31 pages, 10 figures, 6 tables, 203 references

Journal ref: IEEE Communications Surveys & Tutorials, June 2020

arXiv:1911.01460 [pdf, ps, other]

A Failure of Aspect Sentiment Classifiers and an Adaptive Re-weighting Solution

Authors: Hu Xu, Bing Liu, Lei Shu, Philip S. Yu

Abstract: Aspect-based sentiment classification (ASC) is an important task in fine-grained sentiment analysis.~Deep supervised ASC approaches typically model this task as a pair-wise classification task that takes an aspect and a sentence containing the aspect and outputs the polarity of the aspect in that sentence. However, we discovered that many existing approaches fail to learn an effective ASC classifi… ▽ More Aspect-based sentiment classification (ASC) is an important task in fine-grained sentiment analysis.~Deep supervised ASC approaches typically model this task as a pair-wise classification task that takes an aspect and a sentence containing the aspect and outputs the polarity of the aspect in that sentence. However, we discovered that many existing approaches fail to learn an effective ASC classifier but more like a sentence-level sentiment classifier because they have difficulty to handle sentences with different polarities for different aspects.~This paper first demonstrates this problem using several state-of-the-art ASC models. It then proposes a novel and general adaptive re-weighting (ARW) scheme to adjust the training to dramatically improve ASC for such complex sentences. Experimental results show that the proposed framework is effective \footnote{The dataset and code are available at \url{https://github.com/howardhsu/ASC_failure}.}. △ Less

Submitted 4 November, 2019; originally announced November 2019.

arXiv:1911.01172 [pdf]

Fast-UAP: An Algorithm for Speeding up Universal Adversarial Perturbation Generation with Orientation of Perturbation Vectors

Authors: Jiazhu Dai, Le Shu

Abstract: Convolutional neural networks (CNN) have become one of the most popular machine learning tools and are being applied in various tasks, however, CNN models are vulnerable to universal perturbations, which are usually human-imperceptible but can cause natural images to be misclassified with high probability. One of the state-of-the-art algorithms to generate universal perturbations is known as UAP.… ▽ More Convolutional neural networks (CNN) have become one of the most popular machine learning tools and are being applied in various tasks, however, CNN models are vulnerable to universal perturbations, which are usually human-imperceptible but can cause natural images to be misclassified with high probability. One of the state-of-the-art algorithms to generate universal perturbations is known as UAP. UAP only aggregates the minimal perturbations in every iteration, which will lead to generated universal perturbation whose magnitude cannot rise up efficiently and cause a slow generation. In this paper, we proposed an optimized algorithm to improve the performance of crafting universal perturbations based on orientation of perturbation vectors. At each iteration, instead of choosing minimal perturbation vector with respect to each image, we aggregate the current instance of universal perturbation with the perturbation which has similar orientation to the former so that the magnitude of the aggregation will rise up as large as possible at every iteration. The experiment results show that we get universal perturbations in a shorter time and with a smaller number of training images. Furthermore, we observe in experiments that universal perturbations generated by our proposed algorithm have an average increment of fooling rate by 9% in white-box attacks and black-box attacks comparing with universal perturbations generated by UAP. △ Less

Submitted 6 January, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

Comments: 9 pages, 7 figures, 1 table, 1 algorithm

MSC Class: I.2.0 ACM Class: I.2.0

arXiv:1908.11546 [pdf, other]

Modeling Multi-Action Policy for Task-Oriented Dialogues

Authors: Lei Shu, Hu Xu, Bing Liu, Piero Molino

Abstract: Dialogue management (DM) plays a key role in the quality of the interaction with the user in a task-oriented dialogue system. In most existing approaches, the agent predicts only one DM policy action per turn. This significantly limits the expressive power of the conversational agent and introduces unwanted turns of interactions that may challenge users' patience. Longer conversations also lead to… ▽ More Dialogue management (DM) plays a key role in the quality of the interaction with the user in a task-oriented dialogue system. In most existing approaches, the agent predicts only one DM policy action per turn. This significantly limits the expressive power of the conversational agent and introduces unwanted turns of interactions that may challenge users' patience. Longer conversations also lead to more errors and the system needs to be more robust to handle them. In this paper, we compare the performance of several models on the task of predicting multiple acts for each turn. A novel policy model is proposed based on a recurrent cell called gated Continue-Act-Slots (gCAS) that overcomes the limitations of the existing models. Experimental results show that gCAS outperforms other approaches. The code is available at https://leishu02.github.io/ △ Less

Submitted 30 August, 2019; originally announced August 2019.

Comments: 7

arXiv:1908.02402 [pdf, other]

Flexibly-Structured Model for Task-Oriented Dialogues

Authors: Lei Shu, Piero Molino, Mahdi Namazifar, Hu Xu, Bing Liu, Huaixiu Zheng, Gokhan Tur

Abstract: This paper proposes a novel end-to-end architecture for task-oriented dialogue systems. It is based on a simple and practical yet very effective sequence-to-sequence approach, where language understanding and state tracking tasks are modeled jointly with a structured copy-augmented sequential decoder and a multi-label decoder for each slot. The policy engine and language generation tasks are model… ▽ More This paper proposes a novel end-to-end architecture for task-oriented dialogue systems. It is based on a simple and practical yet very effective sequence-to-sequence approach, where language understanding and state tracking tasks are modeled jointly with a structured copy-augmented sequential decoder and a multi-label decoder for each slot. The policy engine and language generation tasks are modeled jointly following that. The copy-augmented sequential decoder deals with new or unknown values in the conversation, while the multi-label decoder combined with the sequential decoder ensures the explicit assignment of values to slots. On the generation part, slot binary classifiers are used to improve performance. This architecture is scalable to real-world scenarios and is shown through an empirical evaluation to achieve state-of-the-art performance on both the Cambridge Restaurant dataset and the Stanford in-car assistant dataset\footnote{The code is available at \url{https://github.com/uber-research/FSDM}} △ Less

Submitted 6 August, 2019; originally announced August 2019.

arXiv:1905.06407 [pdf, other]

Controlled CNN-based Sequence Labeling for Aspect Extraction

Authors: Lei Shu, Hu Xu, Bing Liu

Abstract: One key task of fine-grained sentiment analysis on reviews is to extract aspects or features that users have expressed opinions on. This paper focuses on supervised aspect extraction using a modified CNN called controlled CNN (Ctrl). The modified CNN has two types of control modules. Through asynchronous parameter updating, it prevents over-fitting and boosts CNN's performance significantly. This… ▽ More One key task of fine-grained sentiment analysis on reviews is to extract aspects or features that users have expressed opinions on. This paper focuses on supervised aspect extraction using a modified CNN called controlled CNN (Ctrl). The modified CNN has two types of control modules. Through asynchronous parameter updating, it prevents over-fitting and boosts CNN's performance significantly. This model achieves state-of-the-art results on standard aspect extraction datasets. To the best of our knowledge, this is the first paper to apply control modules to aspect extraction. △ Less

Submitted 29 May, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

arXiv:1904.02232 [pdf, other]

BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis

Authors: Hu Xu, Bing Liu, Lei Shu, Philip S. Yu

Abstract: Question-answering plays an important role in e-commerce as it allows potential customers to actively seek crucial information about products or services to help their purchase decision making. Inspired by the recent success of machine reading comprehension (MRC) on formal documents, this paper explores the potential of turning customer reviews into a large source of knowledge that can be exploite… ▽ More Question-answering plays an important role in e-commerce as it allows potential customers to actively seek crucial information about products or services to help their purchase decision making. Inspired by the recent success of machine reading comprehension (MRC) on formal documents, this paper explores the potential of turning customer reviews into a large source of knowledge that can be exploited to answer user questions.~We call this problem Review Reading Comprehension (RRC). To the best of our knowledge, no existing work has been done on RRC. In this work, we first build an RRC dataset called ReviewRC based on a popular benchmark for aspect-based sentiment analysis. Since ReviewRC has limited training examples for RRC (and also for aspect-based sentiment analysis), we then explore a novel post-training approach on the popular language model BERT to enhance the performance of fine-tuning of BERT for RRC. To show the generality of the approach, the proposed post-training is also applied to some other review-based tasks such as aspect extraction and aspect sentiment classification in aspect-based sentiment analysis. Experimental results demonstrate that the proposed post-training is highly effective. The datasets and code are available at https://www.cs.uic.edu/~hxu/. △ Less

Submitted 3 May, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

Comments: accepted by NAACL 2019

arXiv:1902.00821 [pdf, ps, other]

Review Conversational Reading Comprehension

Authors: Hu Xu, Bing Liu, Lei Shu, Philip S. Yu

Abstract: Inspired by conversational reading comprehension (CRC), this paper studies a novel task of leveraging reviews as a source to build an agent that can answer multi-turn questions from potential consumers of online businesses. We first build a review CRC dataset and then propose a novel task-aware pre-tuning step running between language model (e.g., BERT) pre-training and domain-specific fine-tuning… ▽ More Inspired by conversational reading comprehension (CRC), this paper studies a novel task of leveraging reviews as a source to build an agent that can answer multi-turn questions from potential consumers of online businesses. We first build a review CRC dataset and then propose a novel task-aware pre-tuning step running between language model (e.g., BERT) pre-training and domain-specific fine-tuning. The proposed pre-tuning requires no data annotation, but can greatly enhance the performance on our end task. Experimental results show that the proposed approach is highly effective and has competitive performance as the supervised approach. The dataset is available at \url{https://github.com/howardhsu/RCRC} △ Less

Submitted 6 November, 2019; v1 submitted 2 February, 2019; originally announced February 2019.

arXiv:1809.06004 [pdf, other]

doi 10.1145/3308558.3313644

Open-world Learning and Application to Product Classification

Authors: Hu Xu, Bing Liu, Lei Shu, P. Yu

Abstract: Classic supervised learning makes the closed-world assumption, meaning that classes seen in testing must have been seen in training. However, in the dynamic world, new or unseen class examples may appear constantly. A model working in such an environment must be able to reject unseen classes (not seen or used in training). If enough data is collected for the unseen classes, the system should incre… ▽ More Classic supervised learning makes the closed-world assumption, meaning that classes seen in testing must have been seen in training. However, in the dynamic world, new or unseen class examples may appear constantly. A model working in such an environment must be able to reject unseen classes (not seen or used in training). If enough data is collected for the unseen classes, the system should incrementally learn to accept/classify them. This learning paradigm is called open-world learning (OWL). Existing OWL methods all need some form of re-training to accept or include the new classes in the overall model. In this paper, we propose a meta-learning approach to the problem. Its key novelty is that it only needs to train a meta-classifier, which can then continually accept new classes when they have enough labeled data for the meta-classifier to use, and also detect/reject future unseen classes. No re-training of the meta-classifier or a new overall classifier covering all old and new classes is needed. In testing, the method only uses the examples of the seen classes (including the newly added classes) on-the-fly for classification and rejection. Experimental results demonstrate the effectiveness of the new approach. △ Less

Submitted 1 March, 2019; v1 submitted 16 September, 2018; originally announced September 2018.

Comments: accepted by The Web Conference (WWW 2019) Previous title: Learning to Accept New Classes without Training

arXiv:1805.09991 [pdf, other]

Lifelong Domain Word Embedding via Meta-Learning

Authors: Hu Xu, Bing Liu, Lei Shu, Philip S. Yu

Abstract: Learning high-quality domain word embeddings is important for achieving good performance in many NLP tasks. General-purpose embeddings trained on large-scale corpora are often sub-optimal for domain-specific applications. However, domain-specific tasks often do not have large in-domain corpora for training high-quality domain embeddings. In this paper, we propose a novel lifelong learning setting… ▽ More Learning high-quality domain word embeddings is important for achieving good performance in many NLP tasks. General-purpose embeddings trained on large-scale corpora are often sub-optimal for domain-specific applications. However, domain-specific tasks often do not have large in-domain corpora for training high-quality domain embeddings. In this paper, we propose a novel lifelong learning setting for domain embedding. That is, when performing the new domain embedding, the system has seen many past domains, and it tries to expand the new in-domain corpus by exploiting the corpora from the past domains via meta-learning. The proposed meta-learner characterizes the similarities of the contexts of the same word in many domain corpora, which helps retrieve relevant data from the past domains to expand the new domain corpus. Experimental results show that domain embeddings produced from such a process improve the performance of the downstream tasks. △ Less

Submitted 25 May, 2018; originally announced May 2018.

Comments: 7 pages

Journal ref: IJCAI 2018

arXiv:1805.04601 [pdf, other]

Double Embeddings and CNN-based Sequence Labeling for Aspect Extraction

Authors: Hu Xu, Bing Liu, Lei Shu, Philip S. Yu

Abstract: One key task of fine-grained sentiment analysis of product reviews is to extract product aspects or features that users have expressed opinions on. This paper focuses on supervised aspect extraction using deep learning. Unlike other highly sophisticated supervised deep learning models, this paper proposes a novel and yet simple CNN model employing two types of pre-trained embeddings for aspect ext… ▽ More One key task of fine-grained sentiment analysis of product reviews is to extract product aspects or features that users have expressed opinions on. This paper focuses on supervised aspect extraction using deep learning. Unlike other highly sophisticated supervised deep learning models, this paper proposes a novel and yet simple CNN model employing two types of pre-trained embeddings for aspect extraction: general-purpose embeddings and domain-specific embeddings. Without using any additional supervision, this model achieves surprisingly good results, outperforming state-of-the-art sophisticated existing methods. To our knowledge, this paper is the first to report such double embeddings based CNN model for aspect extraction and achieve very good results. △ Less

Submitted 11 May, 2018; originally announced May 2018.

Comments: ACL 2018

arXiv:1804.07942 [pdf, other]

Generative Stock Question Answering

Authors: Zhaopeng Tu, Yong Jiang, Xiaojiang Liu, Lei Shu, Shuming Shi

Abstract: We study the problem of stock related question answering (StockQA): automatically generating answers to stock related questions, just like professional stock analysts providing action recommendations to stocks upon user's requests. StockQA is quite different from previous QA tasks since (1) the answers in StockQA are natural language sentences (rather than entities or values) and due to the dynami… ▽ More We study the problem of stock related question answering (StockQA): automatically generating answers to stock related questions, just like professional stock analysts providing action recommendations to stocks upon user's requests. StockQA is quite different from previous QA tasks since (1) the answers in StockQA are natural language sentences (rather than entities or values) and due to the dynamic nature of StockQA, it is scarcely possible to get reasonable answers in an extractive way from the training data; and (2) StockQA requires properly analyzing the relationship between keywords in QA pair and the numerical features of a stock. We propose to address the problem with a memory-augmented encoder-decoder architecture, and integrate different mechanisms of number understanding and generation, which is a critical component of StockQA. We build a large-scale dataset containing over 180K StockQA instances, based on which various technique combinations are extensively studied and compared. Experimental results show that a hybrid word-character model with separate character components for number processing, achieves the best performance. By analyzing the results, we found that 44.8% of answers generated by our best model still suffer from the generic answer problem, which can be alleviated by a straightforward hybrid retrieval-generation model. △ Less

Submitted 20 September, 2018; v1 submitted 21 April, 2018; originally announced April 2018.

Comments: data: http://ai.tencent.com/ailab/nlp/data/stockQA.tar.gz

arXiv:1804.00976 [pdf, ps, other]

On Attractors of Isospectral Compressions of Networks

Authors: Leonid Bunimovich, Longmei Shu

Abstract: In the recently developed theory of isospectral transformations of networks isospectral compressions are performed with respect to some chosen characteristic (attribute) of nodes (or edges) of networks. Each isospectral compression (when a certain characteristic is fixed) defines a dynamical system on the space of all networks. It is shown that any orbit of such dynamical system which starts at an… ▽ More In the recently developed theory of isospectral transformations of networks isospectral compressions are performed with respect to some chosen characteristic (attribute) of nodes (or edges) of networks. Each isospectral compression (when a certain characteristic is fixed) defines a dynamical system on the space of all networks. It is shown that any orbit of such dynamical system which starts at any finite network (as the initial point of this orbit) converges to an attractor. Such attractor is a smaller network where a chosen characteristic has the same value for all nodes (or edges). We demonstrate that isospectral contractions of one and the same network defined by different characteristics of nodes (or edges) may converge to the same as well as to different attractors. It is also shown that spectrally equivalent with respect to some characteristic networks could be non-spectrally equivalent for another characteristic of nodes (edges). These results suggest a new constructive approach to analysis of networks structures and to comparison of topologies of different networks. △ Less

Submitted 30 March, 2018; originally announced April 2018.

Comments: arXiv admin note: text overlap with arXiv:1802.03410

MSC Class: 05C50; 15A18

arXiv:1801.05609 [pdf, other]

Unseen Class Discovery in Open-world Classification

Authors: Lei Shu, Hu Xu, Bing Liu

Abstract: This paper concerns open-world classification, where the classifier not only needs to classify test examples into seen classes that have appeared in training but also reject examples from unseen or novel classes that have not appeared in training. Specifically, this paper focuses on discovering the hidden unseen classes of the rejected examples. Clearly, without prior knowledge this is difficult.… ▽ More This paper concerns open-world classification, where the classifier not only needs to classify test examples into seen classes that have appeared in training but also reject examples from unseen or novel classes that have not appeared in training. Specifically, this paper focuses on discovering the hidden unseen classes of the rejected examples. Clearly, without prior knowledge this is difficult. However, we do have the data from the seen training classes, which can tell us what kind of similarity/difference is expected for examples from the same class or from different classes. It is reasonable to assume that this knowledge can be transferred to the rejected examples and used to discover the hidden unseen classes in them. This paper aims to solve this problem. It first proposes a joint open classification model with a sub-model for classifying whether a pair of examples belongs to the same or different classes. This sub-model can serve as a distance function for clustering to discover the hidden classes of the rejected examples. Experimental results show that the proposed model is highly promising. △ Less

Submitted 17 January, 2018; originally announced January 2018.

arXiv:1712.02186 [pdf, other]

Product Function Need Recognition via Semi-supervised Attention Network

Authors: Hu Xu, Sihong Xie, Lei Shu, Philip S. Yu

Abstract: Functionality is of utmost importance to customers when they purchase products. However, it is unclear to customers whether a product can really satisfy their needs on functions. Further, missing functions may be intentionally hidden by the manufacturers or the sellers. As a result, a customer needs to spend a fair amount of time before purchasing or just purchase the product on his/her own risk.… ▽ More Functionality is of utmost importance to customers when they purchase products. However, it is unclear to customers whether a product can really satisfy their needs on functions. Further, missing functions may be intentionally hidden by the manufacturers or the sellers. As a result, a customer needs to spend a fair amount of time before purchasing or just purchase the product on his/her own risk. In this paper, we first identify a novel QA corpus that is dense on product functionality information \footnote{The annotated corpus can be found at \url{https://www.cs.uic.edu/~hxu/}.}. We then design a neural network called Semi-supervised Attention Network (SAN) to discover product functions from questions. This model leverages unlabeled data as contextual information to perform semi-supervised sequence labeling. We conduct experiments to show that the extracted function have both high coverage and accuracy, compared with a wide spectrum of baselines. △ Less

Submitted 6 December, 2017; originally announced December 2017.

arXiv:1712.02016 [pdf, other]

Dual Attention Network for Product Compatibility and Function Satisfiability Analysis

Authors: Hu Xu, Sihong Xie, Lei Shu, Philip S. Yu

Abstract: Product compatibility and their functionality are of utmost importance to customers when they purchase products, and to sellers and manufacturers when they sell products. Due to the huge number of products available online, it is infeasible to enumerate and test the compatibility and functionality of every product. In this paper, we address two closely related problems: product compatibility analy… ▽ More Product compatibility and their functionality are of utmost importance to customers when they purchase products, and to sellers and manufacturers when they sell products. Due to the huge number of products available online, it is infeasible to enumerate and test the compatibility and functionality of every product. In this paper, we address two closely related problems: product compatibility analysis and function satisfiability analysis, where the second problem is a generalization of the first problem (e.g., whether a product works with another product can be considered as a special function). We first identify a novel question and answering corpus that is up-to-date regarding product compatibility and functionality information. To allow automatic discovery product compatibility and functionality, we then propose a deep learning model called Dual Attention Network (DAN). Given a QA pair for a to-be-purchased product, DAN learns to 1) discover complementary products (or functions), and 2) accurately predict the actual compatibility (or satisfiability) of the discovered products (or functions). The challenges addressed by the model include the briefness of QAs, linguistic patterns indicating compatibility, and the appropriate fusion of questions and answers. We conduct experiments to quantitatively and qualitatively show that the identified products and functions have both high coverage and accuracy, compared with a wide spectrum of baselines. △ Less

Submitted 5 December, 2017; originally announced December 2017.

arXiv:1709.08716 [pdf, other]

DOC: Deep Open Classification of Text Documents

Authors: Lei Shu, Hu Xu, Bing Liu

Abstract: Traditional supervised learning makes the closed-world assumption that the classes appeared in the test data must have appeared in training. This also applies to text learning or text classification. As learning is used increasingly in dynamic open environments where some new/test documents may not belong to any of the training classes, identifying these novel documents during classification prese… ▽ More Traditional supervised learning makes the closed-world assumption that the classes appeared in the test data must have appeared in training. This also applies to text learning or text classification. As learning is used increasingly in dynamic open environments where some new/test documents may not belong to any of the training classes, identifying these novel documents during classification presents an important problem. This problem is called open-world classification or open classification. This paper proposes a novel deep learning based approach. It outperforms existing state-of-the-art techniques dramatically. △ Less

Submitted 25 September, 2017; originally announced September 2017.

Comments: accepted at EMNLP 2017

arXiv:1707.00323 [pdf, other]

An improved isogeometric analysis method for trimmed geometries

Authors: Jinlan Xu, Ningning Sun, Laixin Shu, Timon Rabczuk, Gang Xu

Abstract: Trimming techniques are efficient ways to generate complex geometries in Computer-Aided Design(CAD). In this paper, an improved isogeometric analysis(IGA) method for trimmed geometries is proposed. We will show that the proposed method reduces the numerical error of physical solution by 50% for simple trimmed geometries, and the condition number of stiffness matrix is also decreased. Furthermore,… ▽ More Trimming techniques are efficient ways to generate complex geometries in Computer-Aided Design(CAD). In this paper, an improved isogeometric analysis(IGA) method for trimmed geometries is proposed. We will show that the proposed method reduces the numerical error of physical solution by 50% for simple trimmed geometries, and the condition number of stiffness matrix is also decreased. Furthermore, the number of integration elements and integration points involved in the solving process can be significantly reduced compared to previous approaches, drastically improving the computational efficiency for IGA problems on the trimmed geometry. Several examples are illustrated to show the effectiveness of the proposed approach. △ Less

Submitted 2 July, 2017; originally announced July 2017.

arXiv:1705.10030 [pdf, ps, other]

Supervised Complementary Entity Recognition with Augmented Key-value Pairs of Knowledge

Authors: Hu Xu, Lei Shu, Philip S. Yu

Abstract: Extracting opinion targets is an important task in sentiment analysis on product reviews and complementary entities (products) are one important type of opinion targets that may work together with the reviewed product. In this paper, we address the problem of Complementary Entity Recognition (CER) as a supervised sequence labeling with the capability of expanding domain knowledge as key-value pair… ▽ More Extracting opinion targets is an important task in sentiment analysis on product reviews and complementary entities (products) are one important type of opinion targets that may work together with the reviewed product. In this paper, we address the problem of Complementary Entity Recognition (CER) as a supervised sequence labeling with the capability of expanding domain knowledge as key-value pairs from unlabeled reviews, by automatically learning and enhancing knowledge-based features. We use Conditional Random Field (CRF) as the base learner and augment CRF with knowledge-based features (called the Knowledge-based CRF or KCRF for short). We conduct experiments to show that KCRF effectively improves the performance of supervised CER task. △ Less

Submitted 28 May, 2017; originally announced May 2017.

arXiv:1705.00251 [pdf, ps, other]

Lifelong Learning CRF for Supervised Aspect Extraction

Authors: Lei Shu, Hu Xu, Bing Liu

Abstract: This paper makes a focused contribution to supervised aspect extraction. It shows that if the system has performed aspect extraction from many past domains and retained their results as knowledge, Conditional Random Fields (CRF) can leverage this knowledge in a lifelong learning manner to extract in a new domain markedly better than the traditional CRF without using this prior knowledge. The key i… ▽ More This paper makes a focused contribution to supervised aspect extraction. It shows that if the system has performed aspect extraction from many past domains and retained their results as knowledge, Conditional Random Fields (CRF) can leverage this knowledge in a lifelong learning manner to extract in a new domain markedly better than the traditional CRF without using this prior knowledge. The key innovation is that even after CRF training, the model can still improve its extraction with experiences in its applications. △ Less

Submitted 29 April, 2017; originally announced May 2017.

Comments: Accepted at ACL 2017. arXiv admin note: text overlap with arXiv:1612.07940

arXiv:1612.07940 [pdf, ps, other]

Supervised Opinion Aspect Extraction by Exploiting Past Extraction Results

Authors: Lei Shu, Bing Liu, Hu Xu, Annice Kim

Abstract: One of the key tasks of sentiment analysis of product reviews is to extract product aspects or features that users have expressed opinions on. In this work, we focus on using supervised sequence labeling as the base approach to performing the task. Although several extraction methods using sequence labeling methods such as Conditional Random Fields (CRF) and Hidden Markov Models (HMM) have been pr… ▽ More One of the key tasks of sentiment analysis of product reviews is to extract product aspects or features that users have expressed opinions on. In this work, we focus on using supervised sequence labeling as the base approach to performing the task. Although several extraction methods using sequence labeling methods such as Conditional Random Fields (CRF) and Hidden Markov Models (HMM) have been proposed, we show that this supervised approach can be significantly improved by exploiting the idea of concept sharing across multiple domains. For example, "screen" is an aspect in iPhone, but not only iPhone has a screen, many electronic devices have screens too. When "screen" appears in a review of a new domain (or product), it is likely to be an aspect too. Knowing this information enables us to do much better extraction in the new domain. This paper proposes a novel extraction method exploiting this idea in the context of supervised sequence labeling. Experimental results show that it produces markedly better results than without using the past information. △ Less

Submitted 23 December, 2016; originally announced December 2016.

Comments: 10 pages

arXiv:1612.04499 [pdf, other]

Mining Compatible/Incompatible Entities from Question and Answering via Yes/No Answer Classification using Distant Label Expansion

Authors: Hu Xu, Lei Shu, Jingyuan Zhang, Philip S. Yu

Abstract: Product Community Question Answering (PCQA) provides useful information about products and their features (aspects) that may not be well addressed by product descriptions and reviews. We observe that a product's compatibility issues with other products are frequently discussed in PCQA and such issues are more frequently addressed in accessories, i.e., via a yes/no question "Does this mouse work wi… ▽ More Product Community Question Answering (PCQA) provides useful information about products and their features (aspects) that may not be well addressed by product descriptions and reviews. We observe that a product's compatibility issues with other products are frequently discussed in PCQA and such issues are more frequently addressed in accessories, i.e., via a yes/no question "Does this mouse work with windows 10?". In this paper, we address the problem of extracting compatible and incompatible products from yes/no questions in PCQA. This problem can naturally have a two-stage framework: first, we perform Complementary Entity (product) Recognition (CER) on yes/no questions; second, we identify the polarities of yes/no answers to assign the complementary entities a compatibility label (compatible, incompatible or unknown). We leverage an existing unsupervised method for the first stage and a 3-class classifier by combining a distant PU-learning method (learning from positive and unlabeled examples) together with a binary classifier for the second stage. The benefit of using distant PU-learning is that it can help to expand more implicit yes/no answers without using any human annotated data. We conduct experiments on 4 products to show that the proposed method is effective. △ Less

Submitted 14 December, 2016; originally announced December 2016.

Comments: 9 pages, 1 figures

Showing 1–50 of 58 results for author: Shu, L