Xiangdong Su

2024

pdf bib abs
Learning Low-dimensional Multi-domain Knowledge Graph Embedding via Dual Archimedean Spirals
Jiang Li | Xiangdong Su | Fujun Zhang | Guanglai Gao
Findings of the Association for Computational Linguistics: ACL 2024

Knowledge graph embedding (KGE) is extensively employed for link prediction by representing entities and relations as low-dimensional vectors. In real-world scenarios, knowledge graphs (KGs) usually encompass diverse domains, which poses challenges to KG representations. However, existing KGE methods rarely make domain constraints on the embedding distribution of multi-domain KGs, leading to the embedding overlapping of different domains and performance degradation of link prediction. To address this challenge, we propose Dual Archimedean Spiral Knowledge Graph Embedding (DuASE), a low-dimensional KGE model for multi-domain KGs. DuASE is inspired by our discovery that relation types can distinguish entities from different domains. Specifically, DuASE encodes entities with the same relation on the same Archimedean spiral, allowing it to differentiate the entities from different domains. To avoid embedding overlapping across domains, DuASE further makes the head and the tail spirals in the same triplet cluster to their respective domain space by a regularization function. Thus, DuASE can better capture the domain information and the dependencies between entities when modeling the multi-domain KGs, leading to improved KG representations. We validate the effectiveness of DuASE on the novel multi-domain dataset (n-MDKG) introduced in this study and three other benchmark datasets.

Large Language Models (LLMs) exhibit impressive capabilities but also present risks such as biased content generation and privacy issues. One of the current alignment techniques includes principle-driven integration, but it faces challenges arising from the imprecision of manually crafted rules and inadequate risk perception in models without safety training. To address these, we introduce Guide-Align, a two-stage approach. Initially, a safety-trained model identifies potential risks and formulates specific guidelines for various inputs, establishing a comprehensive library of guidelines and a model for input-guidelines retrieval. Subsequently, the retrieval model correlates new inputs with relevant guidelines, which guide LLMs in response generation to ensure safe and high-quality outputs, thereby aligning with human values. An additional optional stage involves fine-tuning a model with well-aligned datasets generated through the process implemented in the second stage.Our method customizes guidelines to accommodate diverse inputs, thereby enhancing the fine-grainedness and comprehensiveness of the guideline library. Furthermore, it incorporates safety expertise from a safety-trained LLM through a lightweight retrieval model.We evaluate our approach on three benchmarks, demonstrating significant improvements in LLM security and quality. Notably, our fine-tuned model, Labrador, even at 13 billion parameters, outperforms GPT-3.5-turbo and surpasses GPT-4 in alignment capabilities.

pdf bib abs
APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning
Jiashuo Sun | Hang Zhang | Chen Lin | Xiangdong Su | Yeyun Gong | Jian Guo
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Long-form numerical reasoning aims to generate a reasoning program to calculate the answer for a given question. Previous work followed a retriever-generator framework, where the retriever selects key facts from a long-form document, and the generator generates a reasoning program based on the retrieved facts. However, they treated all facts equally without considering the different contributions of facts with and without numerical information. Furthermore, they ignored program consistency, leading to the wrong punishment of programs that differed from the ground truth. In order to address these issues, we proposed APOLLO (An optimized training aPproach fOr Long-form numericaL reasOning), to improve long-form numerical reasoning. APOLLO includes a number-aware negative sampling strategy for the retriever to discriminate key numerical facts, and a consistency-based reinforcement learning with target program augmentation for the generator to ultimately increase the execution accuracy. Experimental results on the FinQA and ConvFinQA leaderboards verify the effectiveness of our proposed methods, achieving the new state-of-the-art.

pdf bib abs
EpLSA: Synergy of Expert-prefix Mixtures and Task-Oriented Latent Space Adaptation for Diverse Generative Reasoning
Fujun Zhang | Xiangdong Su | Jiang Li | Rong Yan | Guanglai Gao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Existing models for diverse generative reasoning still struggle to generate multiple unique and plausible results. Through an in-depth examination, we argue that it is critical to leverage a mixture of experts as prefixes to enhance the diversity of generated results and make task-oriented adaptation in the latent space of the generation models to improve the quality of the responses. At this point, we propose EpLSA, an innovative model based on the synergy of expert-prefix mixtures and task-oriented latent space adaptation for diverse generative reasoning. Specifically, we use expert-prefixes mixtures to encourage the model to create multiple responses with different semantics and design a loss function to address the problem that the semantics is interfered by the expert-prefixes. Meanwhile, we design a task-oriented adaptation block to make the pre-trained encoder within the generation model more effectively adapted to the pre-trained decoder in the latent space, thus further improving the quality of the generated text. Extensive experiments on three different types of generative reasoning tasks demonstrate that EpLSA outperforms existing baseline models in terms of both the quality and diversity of the generated outputs. Our code is publicly available at https://github.com/IMU-MachineLearningSXD/EpLSA.

pdf bib abs
Exploring the Synergy of Dual-path Encoder and Alignment Module for Better Graph-to-Text Generation
Tianxin Zhao | Yingxin Liu | Xiangdong Su | Jiang Li | Guanglai Gao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The mainstream approaches view the knowledge graph-to-text (KG-to-text) generation as a sequence-to-sequence task and fine-tune the pre-trained model (PLM) to generate the target text from the linearized knowledge graph. However, the linearization of knowledge graphs and the structure of PLMs lead to the loss of a large amount of graph structure information. Moreover, PLMs lack an explicit graph-text alignment strategy because of the discrepancy between structural and textual information. To solve these two problems, we propose a synergetic KG-to-text model with a dual-path encoder, an alignment module, and a guidance module. The dual-path encoder consists of a graph structure encoder and a text encoder, which can better encode the structure and text information of the knowledge graph. The alignment module contains a two-layer Transformer block and an MLP block, which aligns and integrates the information from the dual encoder. The guidance module combines an improved pointer network and an MLP block to avoid error-generated entities and ensures the fluency and accuracy of the generated text. Our approach obtains very competitive performance on three benchmark datasets. Our code is available from https://github.com/IMu-MachineLearningsxD/G2T.

pdf bib abs
Hyperbolic Representations for Prompt Learning
Nan Chen | Xiangdong Su | Feilong Bao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Continuous prompt tuning has gained significant attention for its ability to train only continuous prompts while freezing the language model. This approach greatly reduces the training time and storage for downstream tasks. In this work, we delve into the hierarchical relationship between the prompts and downstream text inputs. In prompt learning, the prefix prompt acts as a module to guide the downstream language model, establishing a hierarchical relationship between the prefix prompt and subsequent inputs. Furthermore, we explore the benefits of leveraging hyperbolic space for modeling hierarchical structures. We project representations of pre-trained models from Euclidean space into hyperbolic space using the Poincaré disk which effectively captures the hierarchical relationship between the prompt and input text. The experiments on natural language understanding (NLU) tasks illustrate that hyperbolic space can model the hierarchical relationship between prompt and text input. We release our code at https://github.com/myaxxxxx/Hyperbolic-Prompt-Learning.

pdf bib abs
TransERR: Translation-based Knowledge Graph Embedding via Efficient Relation Rotation
Jiang Li | Xiangdong Su | Fujun Zhang | Guanglai Gao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents a translation-based knowledge geraph embedding method via efficient relation rotation (TransERR), a straightforward yet effective alternative to traditional translation-based knowledge graph embedding models. Different from the previous translation-based models, TransERR encodes knowledge graphs in the hypercomplex-valued space, thus enabling it to possess a higher degree of translation freedom in mining latent information between the head and tail entities. To further minimize the translation distance, TransERR adaptively rotates the head entity and the tail entity with their corresponding unit quaternions, which are learnable in model training. We also provide mathematical proofs to demonstrate the ability of TransERR in modeling various relation patterns, including symmetry, antisymmetry, inversion, composition, and subrelation patterns. The experiments on 10 benchmark datasets validate the effectiveness and the generalization of TransERR. The results also indicate that TransERR can better encode large-scale datasets with fewer parameters than the previous translation-based models. Our code and datasets are available at https://github.com/dellixx/TransERR.

2023

pdf bib abs
TeAST: Temporal Knowledge Graph Embedding via Archimedean Spiral Timeline
Jiang Li | Xiangdong Su | Guanglai Gao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Temporal knowledge graph embedding (TKGE) models are commonly utilized to infer the missing facts and facilitate reasoning and decision-making in temporal knowledge graph based systems. However, existing methods fuse temporal information into entities, potentially leading to the evolution of entity information and limiting the link prediction performance of TKG. Meanwhile, current TKGE models often lack the ability to simultaneously model important relation patterns and provide interpretability, which hinders their effectiveness and potential applications. To address these limitations, we propose a novel TKGE model which encodes Temporal knowledge graph embeddings via Archimedean Spiral Timeline (TeAST), which maps relations onto the corresponding Archimedean spiral timeline and transforms the quadruples completion to 3th-order tensor completion problem. Specifically, the Archimedean spiral timeline ensures that relations that occur simultaneously are placed on the same timeline, and all relations evolve over time. Meanwhile, we present a novel temporal spiral regularizer to make the spiral timeline orderly. In addition, we provide mathematical proofs to demonstrate the ability of TeAST to encode various relation patterns. Experimental results show that our proposed model significantly outperforms existing TKGE methods. Our code is available at https://github.com/IMU-MachineLearningSXD/TeAST.

pdf bib abs
How Well Apply Simple MLP to Incomplete Utterance Rewriting?
Jiang Li | Xiangdong Su | Xinlan Ma | Guanglai Gao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Incomplete utterance rewriting (IUR) aims to restore the incomplete utterance with sufficient context information for comprehension. This paper introduces a simple yet efficient IUR method. Different from prior studies, we first employ only one-layer MLP architecture to mine latent semantic information between joint utterances for IUR task (MIUR). After that, we conduct a joint feature matrix to predict the token type and thus restore the incomplete utterance. The well-designed network and simple architecture make our method significantly superior to existing methods in terms of quality and inference speedOur code is available at https://github.com/IMU-MachineLearningSXD/MIUR.

2020

pdf bib abs
Incorporating Inner-word and Out-word Features for Mongolian Morphological Segmentation
Na Liu | Xiangdong Su | Haoran Zhang | Guanglai Gao | Feilong Bao
Proceedings of the 28th International Conference on Computational Linguistics

Mongolian morphological segmentation is regarded as a crucial preprocessing step in many Mongolian related NLP applications and has received extensive attention. Recently, end-to-end segmentation approaches with long short-term memory networks (LSTM) have achieved excellent results. However, the inner-word features among characters in the word and the out-word features from context are not well utilized in the segmentation process. In this paper, we propose a neural network incorporating inner-word and out-word features for Mongolian morphological segmentation. The network consists of two encoders and one decoder. The inner-word encoder uses the self-attention mechanisms to capture the inner-word features of the target word. The out-word encoder employs a two layers BiLSTM network to extract out-word features in the sentence. Then, the decoder adopts a multi-head double attention layer to fuse the inner-word features and out-word features and produces the segmentation result. The evaluation experiment compares the proposed network with the baselines and explores the effectiveness of the sub-modules.