Search | arXiv e-print repository

arXiv:2407.20155 [pdf, other]

GsPINN: A novel fast Green kernel solver based on symmetric Physics-Informed neural networks

Abstract: Ever since deep learning was introduced in the calculation of partial differential equation (PDE), there has been a lot of interests on real time response of system where the kernel function plays an important role. As a popular tool in recent years, physics-informed neural networks (PINNs) was proposed to perform a mesh-free, semi-supervised learning with high flexibility. This paper explores the… ▽ More Ever since deep learning was introduced in the calculation of partial differential equation (PDE), there has been a lot of interests on real time response of system where the kernel function plays an important role. As a popular tool in recent years, physics-informed neural networks (PINNs) was proposed to perform a mesh-free, semi-supervised learning with high flexibility. This paper explores the integration of Lie symmetry groups with deep learning techniques to enhance the numerical solutions of fundamental solution in PDE. We propose a novel approach that combines the strengths of PINN and Lie group theory to address the computational inefficiencies in traditional methods. By incorporating the linearized symmetric condition (LSC) derived from Lie symmetries into PINNs, we introduce a new type of residual loss with lower order of derivative needed to calculate. This integration allows for significant reductions in computational costs and improvements in solution precision. Numerical simulation shows that our method can achieve up to a 50\% reduction in training time while maintaining good accuracy. Additionally, we provide a general theoretical framework to identify invariant infinitesimal generators for arbitrary Cauchy problems. This unsupervised algorithm does not require prior numerical solutions, making it both practical and efficient for various applications. Our contributions demonstrate the potential of combining symmetry analysis with deep learning to advance the field of scientific machine learning. △ Less

Submitted 29 July, 2024; originally announced July 2024.

Comments: 17 pages, 5 figures, 2 tables

arXiv:2407.14507 [pdf, other]

Internal Consistency and Self-Feedback in Large Language Models: A Survey

Authors: Xun Liang, Shichao Song, Zifan Zheng, Hanyu Wang, Qingchen Yu, Xunkai Li, Rong-Hua Li, Feiyu Xiong, Zhiyu Li

Abstract: Large language models (LLMs) are expected to respond accurately but often exhibit deficient reasoning or generate hallucinatory content. To address these, studies prefixed with ``Self-'' such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating itself to mitigate the issues. Nonetheless, these efforts lack a unifie… ▽ More Large language models (LLMs) are expected to respond accurately but often exhibit deficient reasoning or generate hallucinatory content. To address these, studies prefixed with ``Self-'' such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating itself to mitigate the issues. Nonetheless, these efforts lack a unified perspective on summarization, as existing surveys predominantly focus on categorization without examining the motivations behind these works. In this paper, we summarize a theoretical framework, termed Internal Consistency, which offers unified explanations for phenomena such as the lack of reasoning and the presence of hallucinations. Internal Consistency assesses the coherence among LLMs' latent layer, decoding layer, and response layer based on sampling methodologies. Expanding upon the Internal Consistency framework, we introduce a streamlined yet effective theoretical framework capable of mining Internal Consistency, named Self-Feedback. The Self-Feedback framework consists of two modules: Self-Evaluation and Self-Update. This framework has been employed in numerous studies. We systematically classify these studies by tasks and lines of work; summarize relevant evaluation methods and benchmarks; and delve into the concern, ``Does Self-Feedback Really Work?'' We propose several critical viewpoints, including the ``Hourglass Evolution of Internal Consistency'', ``Consistency Is (Almost) Correctness'' hypothesis, and ``The Paradox of Latent and Explicit Reasoning''. Furthermore, we outline promising directions for future research. We have open-sourced the experimental code, reference list, and statistical data, available at \url{https://github.com/IAAR-Shanghai/ICSFSurvey}. △ Less

Submitted 19 July, 2024; originally announced July 2024.

Comments: 27 pages, 9 figures, 10 tables, 14 equations

arXiv:2407.01654 [pdf, other]

A thermodynamically consistent phase-field lattice Boltzmann method for two-phase electrohydrodynamic flows

Authors: Fang Xiong, Lei Wang, Jiangxu Huang, Kang Luo

Abstract: In this work, we aim to develop a phase-field based lattice Boltzmann (LB) method for simulating two-phase electrohydrodynamics (EHD) flows, which allows for different properties (densities, viscosities, conductivity and permittivity) of each phase while maintaining thermodynamic consistency. To this end, we first present a theoretical analysis on the two-phase EHD flows by using the Onsager's var… ▽ More In this work, we aim to develop a phase-field based lattice Boltzmann (LB) method for simulating two-phase electrohydrodynamics (EHD) flows, which allows for different properties (densities, viscosities, conductivity and permittivity) of each phase while maintaining thermodynamic consistency. To this end, we first present a theoretical analysis on the two-phase EHD flows by using the Onsager's variational principle, which is an extension of Rayleigh's principle of least energy dissipation and, naturally, guarantees thermodynamic consistency. It shows that the governing equations of the model include the hydrodynamic equations, Cahn-Hilliard equation coupled with additional electrical effect, and the full Poisson-Nernst-Planck electrokinetic equations. After that, a coupled lattice Boltzmann (LB) scheme is constructed for simulating two-phase EHD flows. In particular, in order to handle two-phase EHD flows with a relatively larger electric permittivity ratio, we also introduce a delicately designed discrete forcing term into the LB equation for electrostatic field. Moreover, some numerical examples including two-phase EHD flows in planar layers and charge diffusion of a Gaussian bell are simulated with the developed LB method. It is shown that our numerical scheme shares a second-order convergence rate in space in predicting electric potential and charge density. Finally, we used the current model to simulate the deformation of a droplet under an electric field and the dynamics of droplet detachment in reversed electrowetting. Our numerical results align well with the theoretic solutions, and the available experimental/numerical data, demonstrating that the proposed method is feasible for simulating two-phase EHD flows. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01178 [pdf, other]

$\text{Memory}^3$: Language Modeling with Explicit Memory

Authors: Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, Weinan E

Abstract: The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowled… ▽ More The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size, training cost, and inference cost, all proportional to the amount of remaining "abstract knowledge". As a preliminary proof of concept, we train from scratch a 2.4B LLM, which achieves better performance than much larger LLMs as well as RAG models, and maintains higher decoding speed than RAG. The model is named $\text{Memory}^3$, since explicit memory is the third form of memory in LLMs after implicit memory (model parameters) and working memory (context key-values). We introduce a memory circuitry theory to support the externalization of knowledge, and present novel techniques including a memory sparsification mechanism that makes storage tractable and a two-stage pretraining scheme that facilitates memory formation. △ Less

Submitted 1 July, 2024; originally announced July 2024.

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2407.00668 [pdf, other]

HRDE: Retrieval-Augmented Large Language Models for Chinese Health Rumor Detection and Explainability

Authors: Yanfang Chen, Ding Chen, Shichao Song, Simin Niu, Hanyu Wang, Zeyun Tang, Feiyu Xiong, Zhiyu Li

Abstract: As people increasingly prioritize their health, the speed and breadth of health information dissemination on the internet have also grown. At the same time, the presence of false health information (health rumors) intermingled with genuine content poses a significant potential threat to public health. However, current research on Chinese health rumors still lacks a large-scale, public, and open-so… ▽ More As people increasingly prioritize their health, the speed and breadth of health information dissemination on the internet have also grown. At the same time, the presence of false health information (health rumors) intermingled with genuine content poses a significant potential threat to public health. However, current research on Chinese health rumors still lacks a large-scale, public, and open-source dataset of health rumor information, as well as effective and reliable rumor detection methods. This paper addresses this gap by constructing a dataset containing 1.12 million health-related rumors (HealthRCN) through web scraping of common health-related questions and a series of data processing steps. HealthRCN is the largest known dataset of Chinese health information rumors to date. Based on this dataset, we propose retrieval-augmented large language models for Chinese health rumor detection and explainability (HRDE). This model leverages retrieved relevant information to accurately determine whether the input health information is a rumor and provides explanatory responses, effectively aiding users in verifying the authenticity of health information. In evaluation experiments, we compared multiple models and found that HRDE outperformed them all, including GPT-4-1106-Preview, in rumor detection accuracy and answer quality. HRDE achieved an average accuracy of 91.04% and an F1 score of 91.58%. △ Less

Submitted 3 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.16069 [pdf, other]

FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models

Authors: Junyi Zhu, Shuochen Liu, Yu Yu, Bo Tang, Yibo Yan, Zhiyu Li, Feiyu Xiong, Tong Xu, Matthew B. Blaschko

Abstract: Large language models (LLMs) excel in generating coherent text, but they often struggle with context awareness, leading to inaccuracies in tasks requiring faithful adherence to provided information. We introduce FastMem, a novel method designed to enhance instruction fine-tuned LLMs' context awareness through fast memorization of the prompt. FastMem maximizes the likelihood of the prompt before in… ▽ More Large language models (LLMs) excel in generating coherent text, but they often struggle with context awareness, leading to inaccuracies in tasks requiring faithful adherence to provided information. We introduce FastMem, a novel method designed to enhance instruction fine-tuned LLMs' context awareness through fast memorization of the prompt. FastMem maximizes the likelihood of the prompt before inference by fine-tuning only the last Feed-Forward Network (FFN) module. This targeted approach ensures efficient optimization without overfitting, significantly improving the model's ability to comprehend and accurately follow the context. Our experiments demonstrate substantial gains in reading comprehension, text summarization and adherence to output structures. For instance, FastMem improves the accuracy of Llama 3-8B-Inst on the NQ-SWAP dataset from 59.1% to 71.6%, and reduces the output structure failure rate of Qwen 1.5-4B-Chat from 34.9% to 25.5%. Extensive experimental results highlight FastMem's potential to offer a robust solution to enhance the reliability and accuracy of LLMs in various applications. Our code is available at: https://github.com/IAAR-Shanghai/FastMem △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2405.20763 [pdf, other]

Improving Generalization and Convergence by Enhancing Implicit Regularization

Authors: Mingze Wang, Haotian He, Jinbo Wang, Zilin Wang, Guanhua Huang, Feiyu Xiong, Zhiyu Li, Weinan E, Lei Wu

Abstract: In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that I… ▽ More In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2\times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets including Wikitext-103, Minipile, and Openwebtext. Moreover, we provide theoretical guarantees, showing that IRE can substantially accelerate the convergence towards flat minima in Sharpness-aware Minimization (SAM). △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 35 pages

arXiv:2405.16933 [pdf, other]

Empowering Large Language Models to Set up a Knowledge Retrieval Indexer via Self-Learning

Authors: Xun Liang, Simin Niu, Zhiyu li, Sensen Zhang, Shichao Song, Hanyu Wang, Jiawei Yang, Feiyu Xiong, Bo Tang, Chenyang Xi

Abstract: Retrieval-Augmented Generation (RAG) offers a cost-effective approach to injecting real-time knowledge into large language models (LLMs). Nevertheless, constructing and validating high-quality knowledge repositories require considerable effort. We propose a pre-retrieval framework named Pseudo-Graph Retrieval-Augmented Generation (PG-RAG), which conceptualizes LLMs as students by providing them wi… ▽ More Retrieval-Augmented Generation (RAG) offers a cost-effective approach to injecting real-time knowledge into large language models (LLMs). Nevertheless, constructing and validating high-quality knowledge repositories require considerable effort. We propose a pre-retrieval framework named Pseudo-Graph Retrieval-Augmented Generation (PG-RAG), which conceptualizes LLMs as students by providing them with abundant raw reading materials and encouraging them to engage in autonomous reading to record factual information in their own words. The resulting concise, well-organized mental indices are interconnected through common topics or complementary facts to form a pseudo-graph database. During the retrieval phase, PG-RAG mimics the human behavior in flipping through notes, identifying fact paths and subsequently exploring the related contexts. Adhering to the principle of the path taken by many is the best, it integrates highly corroborated fact paths to provide a structured and refined sub-graph assisting LLMs. We validated PG-RAG on three specialized question-answering datasets. In single-document tasks, PG-RAG significantly outperformed the current best baseline, KGP-LLaMA, across all key evaluation metrics, with an average overall performance improvement of 11.6%. Specifically, its BLEU score increased by approximately 14.3%, and the QE-F1 metric improved by 23.7%. In multi-document scenarios, the average metrics of PG-RAG were at least 2.35% higher than the best baseline. Notably, the BLEU score and QE-F1 metric showed stable improvements of around 7.55% and 12.75%, respectively. Our code: https://github.com/IAAR-Shanghai/PGRAG. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.11874 [pdf, other]

xFinder: Robust and Pinpoint Answer Extraction for Large Language Models

Authors: Qingchen Yu, Zifan Zheng, Shichao Song, Zhiyu Li, Feiyu Xiong, Bo Tang, Ding Chen

Abstract: The continuous advancement of large language models (LLMs) has brought increasing attention to the critical issue of developing fair and reliable methods for evaluating their performance. Particularly, the emergence of subjective or non-subjective cheating phenomena, such as test set leakage and prompt format overfitting, poses significant challenges to the reliable evaluation of LLMs. Since evalu… ▽ More The continuous advancement of large language models (LLMs) has brought increasing attention to the critical issue of developing fair and reliable methods for evaluating their performance. Particularly, the emergence of subjective or non-subjective cheating phenomena, such as test set leakage and prompt format overfitting, poses significant challenges to the reliable evaluation of LLMs. Since evaluation frameworks often utilize Regular Expression (RegEx) for answer extraction, some models may adjust their responses to comply with specific formats that are easily extractable by RegEx. Nevertheless, the key answer extraction module based on RegEx frequently suffers from extraction errors. This paper conducts a comprehensive analysis of the entire LLM evaluation chain, demonstrating that optimizing the key answer extraction module can improve extraction accuracy, reduce LLMs' reliance on specific answer formats, and enhance the reliability of LLM evaluation. To address these issues, we propose xFinder, a model specifically designed for key answer extraction. As part of this process, we create a specialized dataset, the Key Answer Finder (KAF) dataset, to ensure effective model training and evaluation. Through generalization testing and evaluation in real-world scenarios, the results demonstrate that the smallest xFinder model with only 500 million parameters achieves an average answer extraction accuracy of 93.42%. In contrast, RegEx accuracy in the best evaluation framework is 74.38%. xFinder exhibits stronger robustness and higher accuracy compared to existing evaluation frameworks. △ Less

Submitted 23 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 37 Pages

arXiv:2405.01726 [pdf, ps, other]

SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising

Authors: Guanyiman Fu, Fengchao Xiong, Jianfeng Lu, Jun Zhou

Abstract: Denoising is a crucial preprocessing step for hyperspectral images (HSIs) due to noise arising from intraimaging mechanisms and environmental factors. Long-range spatial-spectral correlation modeling is beneficial for HSI denoising but often comes with high computational complexity. Based on the state space model (SSM), Mamba is known for its remarkable long-range dependency modeling capabilities… ▽ More Denoising is a crucial preprocessing step for hyperspectral images (HSIs) due to noise arising from intraimaging mechanisms and environmental factors. Long-range spatial-spectral correlation modeling is beneficial for HSI denoising but often comes with high computational complexity. Based on the state space model (SSM), Mamba is known for its remarkable long-range dependency modeling capabilities and computational efficiency. Building on this, we introduce a memory-efficient spatial-spectral UMamba (SSUMamba) for HSI denoising, with the spatial-spectral continuous scan (SSCS) Mamba being the core component. SSCS Mamba alternates the row, column, and band in six different orders to generate the sequence and uses the bidirectional SSM to exploit long-range spatial-spectral dependencies. In each order, the images are rearranged between adjacent scans to ensure spatial-spectral continuity. Additionally, 3D convolutions are embedded into the SSCS Mamba to enhance local spatial-spectral modeling. Experiments demonstrate that SSUMamba achieves superior denoising results with lower memory consumption per batch compared to transformer-based methods. The source code is available at https://github.com/lronkitty/SSUMamba. △ Less

Submitted 20 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.06926 [pdf, other]

Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting

Authors: Xiaolei Lang, Laijian Li, Hang Zhang, Feng Xiong, Mu Xu, Yong Liu, Xingxing Zuo, Jiajun Lv

Abstract: We present a real-time LiDAR-Inertial-Camera SLAM system with 3D Gaussian Splatting as the mapping backend. Leveraging robust pose estimates from our LiDAR-Inertial-Camera odometry, Coco-LIC, an incremental photo-realistic mapping system is proposed in this paper. We initialize 3D Gaussians from colorized LiDAR points and optimize them using differentiable rendering powered by 3D Gaussian Splattin… ▽ More We present a real-time LiDAR-Inertial-Camera SLAM system with 3D Gaussian Splatting as the mapping backend. Leveraging robust pose estimates from our LiDAR-Inertial-Camera odometry, Coco-LIC, an incremental photo-realistic mapping system is proposed in this paper. We initialize 3D Gaussians from colorized LiDAR points and optimize them using differentiable rendering powered by 3D Gaussian Splatting. Meticulously designed strategies are employed to incrementally expand the Gaussian map and adaptively control its density, ensuring high-quality mapping with real-time capability. Experiments conducted in diverse scenarios demonstrate the superior performance of our method compared to existing radiance-field-based SLAM systems. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: Submitted to IROS 2024

arXiv:2403.12839 [pdf, other]

Global-guided Focal Neural Radiance Field for Large-scale Scene Rendering

Authors: Mingqi Shao, Feng Xiong, Hang Zhang, Shuang Yang, Mu Xu, Wei Bian, Xueqian Wang

Abstract: Neural radiance fields~(NeRF) have recently been applied to render large-scale scenes. However, their limited model capacity typically results in blurred rendering results. Existing large-scale NeRFs primarily address this limitation by partitioning the scene into blocks, which are subsequently handled by separate sub-NeRFs. These sub-NeRFs, trained from scratch and processed independently, lead t… ▽ More Neural radiance fields~(NeRF) have recently been applied to render large-scale scenes. However, their limited model capacity typically results in blurred rendering results. Existing large-scale NeRFs primarily address this limitation by partitioning the scene into blocks, which are subsequently handled by separate sub-NeRFs. These sub-NeRFs, trained from scratch and processed independently, lead to inconsistencies in geometry and appearance across the scene. Consequently, the rendering quality fails to exhibit significant improvement despite the expansion of model capacity. In this work, we present global-guided focal neural radiance field (GF-NeRF) that achieves high-fidelity rendering of large-scale scenes. Our proposed GF-NeRF utilizes a two-stage (Global and Focal) architecture and a global-guided training strategy. The global stage obtains a continuous representation of the entire scene while the focal stage decomposes the scene into multiple blocks and further processes them with distinct sub-encoders. Leveraging this two-stage architecture, sub-encoders only need fine-tuning based on the global encoder, thus reducing training complexity in the focal stage while maintaining scene-wide consistency. Spatial information and error information from the global stage also benefit the sub-encoders to focus on crucial areas and effectively capture more details of large-scale scenes. Notably, our approach does not rely on any prior knowledge about the target scene, attributing GF-NeRF adaptable to various large-scale scene types, including street-view and aerial-view scenes. We demonstrate that our method achieves high-fidelity, natural rendering results on various types of large-scale datasets. Our project page: https://shaomq2187.github.io/GF-NeRF/ △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.04283 [pdf, other]

Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy

Authors: Yu Zhu, Chuxiong Sun, Wenfei Yang, Wenqiang Wei, Bo Tang, Tianzhu Zhang, Zhiyu Li, Shifeng Zhang, Feiyu Xiong, Jie Hu, Mingchuan yang

Abstract: Reinforcement Learning from Human Feedback (RLHF) is the prevailing approach to ensure Large Language Models (LLMs) align with human values. However, existing RLHF methods require a high computational cost, one main reason being that RLHF assigns both the generation and alignment tasks to the LLM simultaneously. In this paper, we introduce Proxy-RLHF, which decouples the generation and alignment p… ▽ More Reinforcement Learning from Human Feedback (RLHF) is the prevailing approach to ensure Large Language Models (LLMs) align with human values. However, existing RLHF methods require a high computational cost, one main reason being that RLHF assigns both the generation and alignment tasks to the LLM simultaneously. In this paper, we introduce Proxy-RLHF, which decouples the generation and alignment processes of LLMs, achieving alignment with human values at a much lower computational cost. We start with a novel Markov Decision Process (MDP) designed for the alignment process and employ Reinforcement Learning (RL) to train a streamlined proxy model that oversees the token generation of the LLM, without altering the LLM itself. Experiments show that our method achieves a comparable level of alignment with only 1\% of the training parameters of other methods. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.00862 [pdf, other]

NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism

Authors: Miao Li, Ming-Bin Chen, Bo Tang, Shengbin Hou, Pengyu Wang, Haiying Deng, Zhiyu Li, Feiyu Xiong, Keming Mao, Peng Cheng, Yi Luo

Abstract: We present NewsBench, a novel evaluation framework to systematically assess the capabilities of Large Language Models (LLMs) for editorial capabilities in Chinese journalism. Our constructed benchmark dataset is focused on four facets of writing proficiency and six facets of safety adherence, and it comprises manually and carefully designed 1,267 test samples in the types of multiple choice questi… ▽ More We present NewsBench, a novel evaluation framework to systematically assess the capabilities of Large Language Models (LLMs) for editorial capabilities in Chinese journalism. Our constructed benchmark dataset is focused on four facets of writing proficiency and six facets of safety adherence, and it comprises manually and carefully designed 1,267 test samples in the types of multiple choice questions and short answer questions for five editorial tasks in 24 news domains. To measure performances, we propose different GPT-4 based automatic evaluation protocols to assess LLM generations for short answer questions in terms of writing proficiency and safety adherence, and both are validated by the high correlations with human evaluations. Based on the systematic evaluation framework, we conduct a comprehensive analysis of ten popular LLMs which can handle Chinese. The experimental results highlight GPT-4 and ERNIE Bot as top performers, yet reveal a relative deficiency in journalistic safety adherence in creative writing tasks. Our findings also underscore the need for enhanced ethical guidance in machine-generated journalistic content, marking a step forward in aligning LLMs with journalistic standards and safety considerations. △ Less

Submitted 4 June, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

Comments: Long paper, ACL 2024 Main

arXiv:2402.11218 [pdf, other]

Controlled Text Generation for Large Language Model with Dynamic Attribute Graphs

Authors: Xun Liang, Hanyu Wang, Shichao Song, Mengting Hu, Xunzhi Wang, Zhiyu Li, Feiyu Xiong, Bo Tang

Abstract: Controlled Text Generation (CTG) aims to produce texts that exhibit specific desired attributes. In this study, we introduce a pluggable CTG framework for Large Language Models (LLMs) named Dynamic Attribute Graphs-based controlled text generation (DATG). This framework utilizes an attribute scorer to evaluate the attributes of sentences generated by LLMs and constructs dynamic attribute graphs. D… ▽ More Controlled Text Generation (CTG) aims to produce texts that exhibit specific desired attributes. In this study, we introduce a pluggable CTG framework for Large Language Models (LLMs) named Dynamic Attribute Graphs-based controlled text generation (DATG). This framework utilizes an attribute scorer to evaluate the attributes of sentences generated by LLMs and constructs dynamic attribute graphs. DATG modulates the occurrence of key attribute words and key anti-attribute words, achieving effective attribute control without compromising the original capabilities of the model. We conduct experiments across four datasets in two tasks: toxicity mitigation and sentiment transformation, employing five LLMs as foundational models. Our findings highlight a remarkable enhancement in control accuracy, achieving a peak improvement of 19.29% over baseline methods in the most favorable task across four datasets. Additionally, we observe a significant decrease in perplexity, markedly improving text fluency. △ Less

Submitted 24 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

Comments: 18 Pages, Accepted by ACL 2024 Findings

arXiv:2402.07744 [pdf, other]

Towards Unified Alignment Between Agents, Humans, and Environment

Authors: Zonghan Yang, An Liu, Zijun Liu, Kaiming Liu, Fangzhou Xiong, Yile Wang, Zeyuan Yang, Qingyuan Hu, Xinrui Chen, Zhenhe Zhang, Fuwen Luo, Zhicheng Guo, Peng Li, Yang Liu

Abstract: The rapid progress of foundation models has led to the prosperity of autonomous agents, which leverage the universal capabilities of foundation models to conduct reasoning, decision-making, and environmental interaction. However, the efficacy of agents remains limited when operating in intricate, realistic environments. In this work, we introduce the principles of $\mathbf{U}$nified $\mathbf{A}$li… ▽ More The rapid progress of foundation models has led to the prosperity of autonomous agents, which leverage the universal capabilities of foundation models to conduct reasoning, decision-making, and environmental interaction. However, the efficacy of agents remains limited when operating in intricate, realistic environments. In this work, we introduce the principles of $\mathbf{U}$nified $\mathbf{A}$lignment for $\mathbf{A}$gents ($\mathbf{UA}^2$), which advocate for the simultaneous alignment of agents with human intentions, environmental dynamics, and self-constraints such as the limitation of monetary budgets. From the perspective of $\mathbf{UA}^2$, we review the current agent research and highlight the neglected factors in existing agent benchmarks and method candidates. We also conduct proof-of-concept studies by introducing realistic features to WebShop, including user profiles to demonstrate intentions, personalized reranking for complex environmental dynamics, and runtime cost statistics to reflect self-constraints. We then follow the principles of $\mathbf{UA}^2$ to propose an initial design of our agent, and benchmark its performance with several candidate baselines in the retrofitted WebShop. The extensive experimental results further prove the importance of the principles of $\mathbf{UA}^2$. Our research sheds light on the next steps of autonomous agent research with improved general problem-solving abilities. △ Less

Submitted 14 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: Project webpage: https://agent-force.github.io/unified-alignment-for-agents.html

arXiv:2401.17043 [pdf, other]

CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

Authors: Yuanjie Lyu, Zhiyu Li, Simin Niu, Feiyu Xiong, Bo Tang, Wenjin Wang, Hao Wu, Huanyong Liu, Tong Xu, Enhong Chen

Abstract: Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, the evaluation of RAG systems is challenging, as existing benchmarks are limited in scope a… ▽ More Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, the evaluation of RAG systems is challenging, as existing benchmarks are limited in scope and diversity. Most of the current benchmarks predominantly assess question-answering applications, overlooking the broader spectrum of situations where RAG could prove advantageous. Moreover, they only evaluate the performance of the LLM component of the RAG pipeline in the experiments, and neglect the influence of the retrieval component and the external knowledge database. To address these issues, this paper constructs a large-scale and more comprehensive benchmark, and evaluates all the components of RAG systems in various RAG application scenarios. Specifically, we have categorized the range of RAG applications into four distinct types-Create, Read, Update, and Delete (CRUD), each representing a unique use case. "Create" refers to scenarios requiring the generation of original, varied content. "Read" involves responding to intricate questions in knowledge-intensive situations. "Update" focuses on revising and rectifying inaccuracies or inconsistencies in pre-existing texts. "Delete" pertains to the task of summarizing extensive texts into more concise forms. For each of these CRUD categories, we have developed comprehensive datasets to evaluate the performance of RAG systems. We also analyze the effects of various components of the RAG system, such as the retriever, the context length, the knowledge base construction, and the LLM. Finally, we provide useful insights for optimizing the RAG technology for different scenarios. △ Less

Submitted 15 July, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: 40 Pages

arXiv:2401.12326 [pdf, other]

Fine-tuning Large Language Models for Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection

Authors: Feng Xiong, Thanet Markchom, Ziwei Zheng, Subin Jung, Varun Ojha, Huizhi Liang

Abstract: SemEval-2024 Task 8 introduces the challenge of identifying machine-generated texts from diverse Large Language Models (LLMs) in various languages and domains. The task comprises three subtasks: binary classification in monolingual and multilingual (Subtask A), multi-class classification (Subtask B), and mixed text detection (Subtask C). This paper focuses on Subtask A & B. Each subtask is support… ▽ More SemEval-2024 Task 8 introduces the challenge of identifying machine-generated texts from diverse Large Language Models (LLMs) in various languages and domains. The task comprises three subtasks: binary classification in monolingual and multilingual (Subtask A), multi-class classification (Subtask B), and mixed text detection (Subtask C). This paper focuses on Subtask A & B. Each subtask is supported by three datasets for training, development, and testing. To tackle this task, two methods: 1) using traditional machine learning (ML) with natural language preprocessing (NLP) for feature extraction, and 2) fine-tuning LLMs for text classification. The results show that transformer models, particularly LoRA-RoBERTa, exceed traditional ML methods in effectiveness, with majority voting being particularly effective in multilingual contexts for identifying machine-generated texts. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.03885 [pdf, ps, other]

doi 10.1109/TGRS.2024.3374953

Hyperspectral Image Denoising via Spatial-Spectral Recurrent Transformer

Authors: Guanyiman Fu, Fengchao Xiong, Jianfeng Lu, Jun Zhou, Jiantao Zhou, Yuntao Qian

Abstract: Hyperspectral images (HSIs) often suffer from noise arising from both intra-imaging mechanisms and environmental factors. Leveraging domain knowledge specific to HSIs, such as global spectral correlation (GSC) and non-local spatial self-similarity (NSS), is crucial for effective denoising. Existing methods tend to independently utilize each of these knowledge components with multiple blocks, overl… ▽ More Hyperspectral images (HSIs) often suffer from noise arising from both intra-imaging mechanisms and environmental factors. Leveraging domain knowledge specific to HSIs, such as global spectral correlation (GSC) and non-local spatial self-similarity (NSS), is crucial for effective denoising. Existing methods tend to independently utilize each of these knowledge components with multiple blocks, overlooking the inherent 3D nature of HSIs where domain knowledge is strongly interlinked, resulting in suboptimal performance. To address this challenge, this paper introduces a spatial-spectral recurrent transformer U-Net (SSRT-UNet) for HSI denoising. The proposed SSRT-UNet integrates NSS and GSC properties within a single SSRT block. This block consists of a spatial branch and a spectral branch. The spectral branch employs a combination of transformer and recurrent neural network to perform recurrent computations across bands, allowing for GSC exploitation beyond a fixed number of bands. Concurrently, the spatial branch encodes NSS for each band by sharing keys and values with the spectral branch under the guidance of GSC. This interaction between the two branches enables the joint utilization of NSS and GSC, avoiding their independent treatment. Experimental results demonstrate that our method outperforms several alternative approaches. The source code will be available at https://github.com/lronkitty/SSRT. △ Less

Submitted 8 January, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

arXiv:2401.03385 [pdf, other]

Grimoire is All You Need for Enhancing Large Language Models

Authors: Ding Chen, Shichao Song, Qingchen Yu, Zhiyu Li, Wenjin Wang, Feiyu Xiong, Bo Tang

Abstract: In-context Learning (ICL) is one of the key methods for enhancing the performance of large language models on specific tasks by providing a set of few-shot examples. However, the ICL capability of different types of models shows significant variation due to factors such as model architecture, volume of learning data, and the size of parameters. Generally, the larger the model's parameter size and… ▽ More In-context Learning (ICL) is one of the key methods for enhancing the performance of large language models on specific tasks by providing a set of few-shot examples. However, the ICL capability of different types of models shows significant variation due to factors such as model architecture, volume of learning data, and the size of parameters. Generally, the larger the model's parameter size and the more extensive the learning data, the stronger its ICL capability. In this paper, we propose a method SLEICL that involves learning from examples using strong language models and then summarizing and transferring these learned skills to weak language models for inference and application. This ensures the stability and effectiveness of ICL. Compared to directly enabling weak language models to learn from prompt examples, SLEICL reduces the difficulty of ICL for these models. Our experiments, conducted on up to eight datasets with five language models, demonstrate that weak language models achieve consistent improvement over their own zero-shot or few-shot capabilities using the SLEICL method. Some weak language models even surpass the performance of GPT4-1106-preview (zero-shot) with the aid of SLEICL. △ Less

Submitted 10 January, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

Comments: 9 pages

arXiv:2311.15296 [pdf, other]

UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation

Authors: Xun Liang, Shichao Song, Simin Niu, Zhiyu Li, Feiyu Xiong, Bo Tang, Yezhaohui Wang, Dawei He, Peng Cheng, Zhonghao Wang, Haiying Deng

Abstract: Large language models (LLMs) have emerged as pivotal contributors in contemporary natural language processing and are increasingly being applied across a diverse range of industries. However, these large-scale probabilistic statistical models cannot currently ensure the requisite quality in professional content generation. These models often produce hallucinated text, compromising their practical… ▽ More Large language models (LLMs) have emerged as pivotal contributors in contemporary natural language processing and are increasingly being applied across a diverse range of industries. However, these large-scale probabilistic statistical models cannot currently ensure the requisite quality in professional content generation. These models often produce hallucinated text, compromising their practical utility in professional contexts. To assess the authentic reliability of LLMs in text generation, numerous initiatives have developed benchmark evaluations for hallucination phenomena. Nevertheless, these benchmarks frequently utilize constrained generation techniques due to cost and temporal constraints. These techniques encompass the use of directed hallucination induction and strategies that deliberately alter authentic text to produce hallucinations. These approaches are not congruent with the unrestricted text generation demanded by real-world applications. Furthermore, a well-established Chinese-language dataset dedicated to the evaluation of hallucinations in text generation is presently lacking. Consequently, we have developed an Unconstrained Hallucination Generation Evaluation (UHGEval) benchmark, designed to compile outputs produced with minimal restrictions by LLMs. Concurrently, we have established a comprehensive benchmark evaluation framework to aid subsequent researchers in undertaking scalable and reproducible experiments. We have also executed extensive experiments, evaluating prominent Chinese language models and the GPT series models to derive professional performance insights regarding hallucination challenges. △ Less

Submitted 23 May, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: Accepted by ACL 2024

arXiv:2305.01422 [pdf, other]

doi 10.1103/PhysRevB.109.054201

Distinct quasiparticle interference patterns for surface impurity scattering on various Weyl semimetals

Authors: Feng Xiong, Chaocheng He, Yong Liu, Annica M. Black-Schaffer, Tanay Nag

Abstract: We examine the response of the Fermi arc in the context of quasi-particle interference (QPI) with regard to a localized surface impurity on various three-dimensional Weyl semimetals (WSMs). Our study also reveals the variation of the local density of states (LDOS), obtained by Fourier transforming the QPI profile, on the two-dimensional surface. We use the $T$-matrix formalism to numerically (anal… ▽ More We examine the response of the Fermi arc in the context of quasi-particle interference (QPI) with regard to a localized surface impurity on various three-dimensional Weyl semimetals (WSMs). Our study also reveals the variation of the local density of states (LDOS), obtained by Fourier transforming the QPI profile, on the two-dimensional surface. We use the $T$-matrix formalism to numerically (analytically and numerically) capture the details of the momentum space scattering in QPI (real space decay in LDOS), considering relevant tight-binding lattice and/or low-energy continuum models modeling a range of different WSMs. In particular, we consider multi-WSM (mWSM), hosting multiple Fermi arcs between two opposite chirality Weyl nodes (WNs), where we find a universal $1/r$-decay ($r$ measuring the radial distance from the impurity core) of the impurity-induced LDOS, irrespective of the topological charge. Interestingly, the inter-Fermi arc scattering is only present for triple WSMs, where we find an additional $1/r^3$-decay as compared to double and single WSMs. The untilted single (double) [triple] WSM shows a straight-line (leaf-like) [oval-shaped] QPI profile. The above QPI profiles are canted for hybrid WSMs where type-I and type-II Weyl nodes coexist, however, hybrid single WSM demonstrates strong non-uniformity, unlike the hybrid double and triple WSMs. We also show that the chirality and the positions of the Weyl nodes imprint marked signatures in the QPI profile. This allows us to distinguish between different WSMs, including the time-reversal-broken WSMs from the time-reversal-invariant WSM, even though both of the WSMs can host two pairs of Weyl nodes. Our study can thus shed light on experimentally obtainable complex QPI profiles and help differentiate different WSMs and their surface band structures. △ Less

Submitted 28 March, 2024; v1 submitted 2 May, 2023; originally announced May 2023.

Comments: 19 pages, 6 figures

Journal ref: Phys. Rev. B 109, 054201 (2024)

arXiv:2304.09048 [pdf, other]

CodeKGC: Code Language Model for Generative Knowledge Graph Construction

Authors: Zhen Bi, Jing Chen, Yinuo Jiang, Feiyu Xiong, Wei Guo, Huajun Chen, Ningyu Zhang

Abstract: Current generative knowledge graph construction approaches usually fail to capture structural knowledge by simply flattening natural language into serialized texts or a specification language. However, large generative language model trained on structured data such as code has demonstrated impressive capability in understanding natural language for structural prediction and reasoning tasks. Intuit… ▽ More Current generative knowledge graph construction approaches usually fail to capture structural knowledge by simply flattening natural language into serialized texts or a specification language. However, large generative language model trained on structured data such as code has demonstrated impressive capability in understanding natural language for structural prediction and reasoning tasks. Intuitively, we address the task of generative knowledge graph construction with code language model: given a code-format natural language input, the target is to generate triples which can be represented as code completion tasks. Specifically, we develop schema-aware prompts that effectively utilize the semantic structure within the knowledge graph. As code inherently possesses structure, such as class and function definitions, it serves as a useful model for prior semantic structural knowledge. Furthermore, we employ a rationale-enhanced generation method to boost the performance. Rationales provide intermediate steps, thereby improving knowledge extraction abilities. Experimental results indicate that the proposed approach can obtain better performance on benchmark datasets compared with baselines. Code and datasets are available in https://github.com/zjunlp/DeepKE/tree/main/example/llm. △ Less

Submitted 18 January, 2024; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: ACM Transactions on Asian and Low-Resource Language Information Processing

arXiv:2304.07423 [pdf, other]

Instability and Momentum Bifurcation of molecular BEC in Exotic Dispersion with Shaken Lattice

Authors: Kaiyue Wang, Feng Xiong, Yun Long, Yun Ma, Colin V. Parker

Abstract: We place a molecular Bose-Einstein condensate in a 1D shaken lattice with a Floquet-engineered dispersion, and observe the dynamics in both position and momentum space. At the initial condition of zero momentum, our engineered dispersion is inverted, and therefore unstable. We observe that the condensate is destabilized by the lattice shaking as expected, but rather than decaying incoherently or p… ▽ More We place a molecular Bose-Einstein condensate in a 1D shaken lattice with a Floquet-engineered dispersion, and observe the dynamics in both position and momentum space. At the initial condition of zero momentum, our engineered dispersion is inverted, and therefore unstable. We observe that the condensate is destabilized by the lattice shaking as expected, but rather than decaying incoherently or producing jets, as in other unstable condensates, under our conditions the condensate bifurcates into two portions in momentum space, with each portion subsequently following semi-classical trajectories that suffer minimal spreading in momentum space as they evolve. We can model the evolution with a Gross-Pitaevskii equation, which suggests the initial bifurcation is facilitate by a nearly linear "inverted V"-shaped dispersion at the zone center, while the lack of spreading in momentum space is facilitated by interactions, as in a soliton. We propose that this relatively clean bifurcation in momentum space has applications for counter-diabatic preparation of exotic ground states in many-body quantum simulation schemes. △ Less

Submitted 24 August, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

arXiv:2304.06925

YOLO-Drone:Airborne real-time detection of dense small objects from high-altitude perspective

Authors: Li Zhu, Jiahui Xiong, Feng Xiong, Hanzheng Hu, Zhengnan Jiang

Abstract: Unmanned Aerial Vehicles (UAVs), specifically drones equipped with remote sensing object detection technology, have rapidly gained a broad spectrum of applications and emerged as one of the primary research focuses in the field of computer vision. Although UAV remote sensing systems have the ability to detect various objects, small-scale objects can be challenging to detect reliably due to factors… ▽ More Unmanned Aerial Vehicles (UAVs), specifically drones equipped with remote sensing object detection technology, have rapidly gained a broad spectrum of applications and emerged as one of the primary research focuses in the field of computer vision. Although UAV remote sensing systems have the ability to detect various objects, small-scale objects can be challenging to detect reliably due to factors such as object size, image degradation, and real-time limitations. To tackle these issues, a real-time object detection algorithm (YOLO-Drone) is proposed and applied to two new UAV platforms as well as a specific light source (silicon-based golden LED). YOLO-Drone presents several novelties: 1) including a new backbone Darknet59; 2) a new complex feature aggregation module MSPP-FPN that incorporated one spatial pyramid pooling and three atrous spatial pyramid pooling modules; 3) and the use of Generalized Intersection over Union (GIoU) as the loss function. To evaluate performance, two benchmark datasets, UAVDT and VisDrone, along with one homemade dataset acquired at night under silicon-based golden LEDs, are utilized. The experimental results show that, in both UAVDT and VisDrone, the proposed YOLO-Drone outperforms state-of-the-art (SOTA) object detection methods by improving the mAP of 10.13% and 8.59%, respectively. With regards to UAVDT, the YOLO-Drone exhibits both high real-time inference speed of 53 FPS and a maximum mAP of 34.04%. Notably, YOLO-Drone achieves high performance under the silicon-based golden LEDs, with a mAP of up to 87.71%, surpassing the performance of YOLO series under ordinary light sources. To conclude, the proposed YOLO-Drone is a highly effective solution for object detection in UAV applications, particularly for night detection tasks where silicon-based golden light LED technology exhibits significant superiority. △ Less

Submitted 10 October, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: Some contributing authors are not signed

arXiv:2303.02959 [pdf, other]

Butterfly: Multiple Reference Frames Feature Propagation Mechanism for Neural Video Compression

Authors: Feng Wang, Haihang Ruan, Fei Xiong, Jiayu Yang, Litian Li, Ronggang Wang

Abstract: Using more reference frames can significantly improve the compression efficiency in neural video compression. However, in low-latency scenarios, most existing neural video compression frameworks usually use the previous one frame as reference. Or a few frameworks which use the previous multiple frames as reference only adopt a simple multi-reference frames propagation mechanism. In this paper, we… ▽ More Using more reference frames can significantly improve the compression efficiency in neural video compression. However, in low-latency scenarios, most existing neural video compression frameworks usually use the previous one frame as reference. Or a few frameworks which use the previous multiple frames as reference only adopt a simple multi-reference frames propagation mechanism. In this paper, we present a more reasonable multi-reference frames propagation mechanism for neural video compression, called butterfly multi-reference frame propagation mechanism (Butterfly), which allows a more effective feature fusion of multi-reference frames. By this, we can generate more accurate temporal context conditional prior for Contextual Coding Module. Besides, when the number of decoded frames does not meet the required number of reference frames, we duplicate the nearest reference frame to achieve the requirement, which is better than duplicating the furthest one. Experiment results show that our method can significantly outperform the previous state-of-the-art (SOTA), and our neural codec can achieve -7.6% bitrate save on HEVC Class D dataset when compares with our base single-reference frame model with the same compression configuration. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: Accepted by DCC 2023

arXiv:2211.07504 [pdf, other]

On Analyzing the Role of Image for Visual-enhanced Relation Extraction

Authors: Lei Li, Xiang Chen, Shuofei Qiao, Feiyu Xiong, Huajun Chen, Ningyu Zhang

Abstract: Multimodal relation extraction is an essential task for knowledge graph construction. In this paper, we take an in-depth empirical analysis that indicates the inaccurate information in the visual scene graph leads to poor modal alignment weights, further degrading performance. Moreover, the visual shuffle experiments illustrate that the current approaches may not take full advantage of visual info… ▽ More Multimodal relation extraction is an essential task for knowledge graph construction. In this paper, we take an in-depth empirical analysis that indicates the inaccurate information in the visual scene graph leads to poor modal alignment weights, further degrading performance. Moreover, the visual shuffle experiments illustrate that the current approaches may not take full advantage of visual information. Based on the above observation, we further propose a strong baseline with an implicit fine-grained multimodal alignment based on Transformer for multimodal relation extraction. Experimental results demonstrate the better performance of our method. Codes are available at https://github.com/zjunlp/DeepKE/tree/main/example/re/multimodal. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: Accepted by AAAI 2023 (Student Abstract)

arXiv:2210.08142 [pdf]

Time-resolved temperature mapping leveraging the strong thermo-optic effect in phase-change devices

Authors: Nicholas A. Nobile, John R. Erickson, Carlos Ríos, Yifei Zhang, Juejun Hu, Steven A. Vitale, Feng Xiong, Nathan Youngblood

Abstract: Optical phase-change materials are highly promising for emerging applications such as tunable metasurfaces, reconfigurable photonic circuits, and non-von Neumann computing. However, these materials typically require both high melting temperatures and fast quenching rates to reversibly switch between their crystalline and amorphous phases, a significant challenge for large-scale integration. Here,… ▽ More Optical phase-change materials are highly promising for emerging applications such as tunable metasurfaces, reconfigurable photonic circuits, and non-von Neumann computing. However, these materials typically require both high melting temperatures and fast quenching rates to reversibly switch between their crystalline and amorphous phases, a significant challenge for large-scale integration. Here, we present an experimental technique which leverages the thermo-optic effect in GST to enable both spatial and temporal thermal measurements of two common electro-thermal microheater designs currently used by the phase-change community. Our approach shows excellent agreement between experimental results and numerical simulations and provides a non-invasive method for rapid characterization of electrically programmable phase-change devices. △ Less

Submitted 14 October, 2022; originally announced October 2022.

arXiv:2209.15214 [pdf, other]

Construction and Applications of Billion-Scale Pre-Trained Multimodal Business Knowledge Graph

Authors: Shumin Deng, Chengming Wang, Zhoubo Li, Ningyu Zhang, Zelin Dai, Hehong Chen, Feiyu Xiong, Ming Yan, Qiang Chen, Mosha Chen, Jiaoyan Chen, Jeff Z. Pan, Bryan Hooi, Huajun Chen

Abstract: Business Knowledge Graphs (KGs) are important to many enterprises today, providing factual knowledge and structured data that steer many products and make them more intelligent. Despite their promising benefits, building business KG necessitates solving prohibitive issues of deficient structure and multiple modalities. In this paper, we advance the understanding of the practical challenges related… ▽ More Business Knowledge Graphs (KGs) are important to many enterprises today, providing factual knowledge and structured data that steer many products and make them more intelligent. Despite their promising benefits, building business KG necessitates solving prohibitive issues of deficient structure and multiple modalities. In this paper, we advance the understanding of the practical challenges related to building KG in non-trivial real-world systems. We introduce the process of building an open business knowledge graph (OpenBG) derived from a well-known enterprise, Alibaba Group. Specifically, we define a core ontology to cover various abstract products and consumption demands, with fine-grained taxonomy and multimodal facts in deployed applications. OpenBG is an open business KG of unprecedented scale: 2.6 billion triples with more than 88 million entities covering over 1 million core classes/concepts and 2,681 types of relations. We release all the open resources (OpenBG benchmarks) derived from it for the community and report experimental results of KG-centric tasks. We also run up an online competition based on OpenBG benchmarks, and has attracted thousands of teams. We further pre-train OpenBG and apply it to many KG- enhanced downstream tasks in business scenarios, demonstrating the effectiveness of billion-scale multimodal knowledge for e-commerce. All the resources with codes have been released at \url{https://github.com/OpenBGBenchmark/OpenBG}. △ Less

Submitted 19 March, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: OpenBG. Accepted by ICDE 2023. The project is released at https://github.com/OpenBGBenchmark/OpenBG . Website: https://kg.alibaba.com/ , Leaderboard: https://tianchi.aliyun.com/dataset/dataDetail?dataId=122271

arXiv:2207.07790 [pdf, other]

BCRLSP: An Offline Reinforcement Learning Framework for Sequential Targeted Promotion

Authors: Fanglin Chen, Xiao Liu, Bo Tang, Feiyu Xiong, Serim Hwang, Guomian Zhuang

Abstract: We utilize an offline reinforcement learning (RL) model for sequential targeted promotion in the presence of budget constraints in a real-world business environment. In our application, the mobile app aims to boost customer retention by sending cash bonuses to customers and control the costs of such cash bonuses during each time period. To achieve the multi-task goal, we propose the Budget Constra… ▽ More We utilize an offline reinforcement learning (RL) model for sequential targeted promotion in the presence of budget constraints in a real-world business environment. In our application, the mobile app aims to boost customer retention by sending cash bonuses to customers and control the costs of such cash bonuses during each time period. To achieve the multi-task goal, we propose the Budget Constrained Reinforcement Learning for Sequential Promotion (BCRLSP) framework to determine the value of cash bonuses to be sent to users. We first find out the target policy and the associated Q-values that maximizes the user retention rate using an RL model. A linear programming (LP) model is then added to satisfy the constraints of promotion costs. We solve the LP problem by maximizing the Q-values of actions learned from the RL model given the budget constraints. During deployment, we combine the offline RL model with the LP model to generate a robust policy under the budget constraints. Using both online and offline experiments, we demonstrate the efficacy of our approach by showing that BCRLSP achieves a higher long-term customer retention rate and a lower cost than various baselines. Taking advantage of the near real-time cost control method, the proposed framework can easily adapt to data with a noisy behavioral policy and/or meet flexible budget constraints. △ Less

Submitted 15 July, 2022; originally announced July 2022.

Comments: 8 pages, DRL4IR@SIGIR

arXiv:2206.03864 [pdf, other]

doi 10.36227/techrxiv.19391279

Discontinuity Computing using Physics-Informed Neural Network

Authors: Li Liu, Shengping Liu, Hui Xie, Fansheng Xiong, Tengchao Yu, Mengjuan Xiao, Lufeng Liu, Heng Yong

Abstract: Simulating discontinuities is a long standing problem especially for shock waves with strong nonlinear feather. Despite being a promising method, the recently developed physics-informed neural network (PINN) is still weak for calculating discontinuities compared with traditional shock-capturing methods. In this paper, we intend to improve the shock-capturing ability of the PINN. The primary strate… ▽ More Simulating discontinuities is a long standing problem especially for shock waves with strong nonlinear feather. Despite being a promising method, the recently developed physics-informed neural network (PINN) is still weak for calculating discontinuities compared with traditional shock-capturing methods. In this paper, we intend to improve the shock-capturing ability of the PINN. The primary strategy of this work is to weaken the expression of the network near discontinuities by adding a gradient-weight into the governing equations locally at each residual point. This strategy allows the network to focus on training smooth parts of the solutions. Then, automatically affected by the compressible property near shock waves, a sharp discontinuity appears with wrong inside shock transition-points compressed into well-trained smooth regions as passive particles. We study the solutions of one-dimensional Burgers equation and one- and two-dimensional Euler equations. Compared with the traditional high-order WENO-Z method in numerical examples, the proposed method can substantially improve discontinuity computing. △ Less

Submitted 6 August, 2022; v1 submitted 5 June, 2022; originally announced June 2022.

arXiv:2206.03739 [pdf, other]

Disentangled Ontology Embedding for Zero-shot Learning

Authors: Yuxia Geng, Jiaoyan Chen, Wen Zhang, Yajing Xu, Zhuo Chen, Jeff Z. Pan, Yufeng Huang, Feiyu Xiong, Huajun Chen

Abstract: Knowledge Graph (KG) and its variant of ontology have been widely used for knowledge representation, and have shown to be quite effective in augmenting Zero-shot Learning (ZSL). However, existing ZSL methods that utilize KGs all neglect the intrinsic complexity of inter-class relationships represented in KGs. One typical feature is that a class is often related to other classes in different semant… ▽ More Knowledge Graph (KG) and its variant of ontology have been widely used for knowledge representation, and have shown to be quite effective in augmenting Zero-shot Learning (ZSL). However, existing ZSL methods that utilize KGs all neglect the intrinsic complexity of inter-class relationships represented in KGs. One typical feature is that a class is often related to other classes in different semantic aspects. In this paper, we focus on ontologies for augmenting ZSL, and propose to learn disentangled ontology embeddings guided by ontology properties to capture and utilize more fine-grained class relationships in different aspects. We also contribute a new ZSL framework named DOZSL, which contains two new ZSL solutions based on generative models and graph propagation models, respectively, for effectively utilizing the disentangled ontology embeddings. Extensive evaluations have been conducted on five benchmarks across zero-shot image classification (ZS-IMGC) and zero-shot KG completion (ZS-KGC). DOZSL often achieves better performance than the state-of-the-art, and its components have been verified by ablation studies and case studies. Our codes and datasets are available at https://github.com/zjukg/DOZSL. △ Less

Submitted 8 June, 2022; originally announced June 2022.

Comments: Accepted by KDD'22

arXiv:2205.10852 [pdf, other]

doi 10.1016/j.neucom.2023.127044

Relphormer: Relational Graph Transformer for Knowledge Graph Representations

Authors: Zhen Bi, Siyuan Cheng, Jing Chen, Xiaozhuan Liang, Feiyu Xiong, Ningyu Zhang

Abstract: Transformers have achieved remarkable performance in widespread fields, including natural language processing, computer vision and graph mining. However, vanilla Transformer architectures have not yielded promising improvements in the Knowledge Graph (KG) representations, where the translational distance paradigm dominates this area. Note that vanilla Transformer architectures struggle to capture… ▽ More Transformers have achieved remarkable performance in widespread fields, including natural language processing, computer vision and graph mining. However, vanilla Transformer architectures have not yielded promising improvements in the Knowledge Graph (KG) representations, where the translational distance paradigm dominates this area. Note that vanilla Transformer architectures struggle to capture the intrinsically heterogeneous structural and semantic information of knowledge graphs. To this end, we propose a new variant of Transformer for knowledge graph representations dubbed Relphormer. Specifically, we introduce Triple2Seq which can dynamically sample contextualized sub-graph sequences as the input to alleviate the heterogeneity issue. We propose a novel structure-enhanced self-attention mechanism to encode the relational information and keep the semantic information within entities and relations. Moreover, we utilize masked knowledge modeling for general knowledge graph representation learning, which can be applied to various KG-based tasks including knowledge graph completion, question answering, and recommendation. Experimental results on six datasets show that Relphormer can obtain better performance compared with baselines. Code is available in https://github.com/zjunlp/Relphormer. △ Less

Submitted 21 November, 2023; v1 submitted 22 May, 2022; originally announced May 2022.

Comments: Neurocomputing 2023

arXiv:2205.10362 [pdf, ps, other]

FIND:Explainable Framework for Meta-learning

Authors: Xinyue Shao, Hongzhi Wang, Xiao Zhu, Feng Xiong

Abstract: Meta-learning is used to efficiently enable the automatic selection of machine learning models by combining data and prior knowledge. Since the traditional meta-learning technique lacks explainability, as well as shortcomings in terms of transparency and fairness, achieving explainability for meta-learning is crucial. This paper proposes FIND, an interpretable meta-learning framework that not only… ▽ More Meta-learning is used to efficiently enable the automatic selection of machine learning models by combining data and prior knowledge. Since the traditional meta-learning technique lacks explainability, as well as shortcomings in terms of transparency and fairness, achieving explainability for meta-learning is crucial. This paper proposes FIND, an interpretable meta-learning framework that not only can explain the recommendation results of meta-learning algorithm selection, but also provide a more complete and accurate explanation of the recommendation algorithm's performance on specific datasets combined with business scenarios. The validity and correctness of this framework have been demonstrated by extensive experiments. △ Less

Submitted 12 June, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

arXiv:2205.05889 [pdf, other]

Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction

Authors: Tianshu Wang, Hongyu Lin, Cheng Fu, Xianpei Han, Le Sun, Feiyu Xiong, Hui Chen, Minlong Lu, Xiuwen Zhu

Abstract: Entity matching (EM) is the most critical step for entity resolution (ER). While current deep learningbased methods achieve very impressive performance on standard EM benchmarks, their realworld application performance is much frustrating. In this paper, we highlight that such the gap between reality and ideality stems from the unreasonable benchmark construction process, which is inconsistent wit… ▽ More Entity matching (EM) is the most critical step for entity resolution (ER). While current deep learningbased methods achieve very impressive performance on standard EM benchmarks, their realworld application performance is much frustrating. In this paper, we highlight that such the gap between reality and ideality stems from the unreasonable benchmark construction process, which is inconsistent with the nature of entity matching and therefore leads to biased evaluations of current EM approaches. To this end, we build a new EM corpus and re-construct EM benchmarks to challenge critical assumptions implicit in the previous benchmark construction process by step-wisely changing the restricted entities, balanced labels, and single-modal records in previous benchmarks into open entities, imbalanced labels, and multimodal records in an open environment. Experimental results demonstrate that the assumptions made in the previous benchmark construction process are not coincidental with the open environment, which conceal the main challenges of the task and therefore significantly overestimate the current progress of entity matching. The constructed benchmarks and code are publicly released △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: Accepted to IJCAI2022

arXiv:2202.12571 [pdf, other]

NeuralKG: An Open Source Library for Diverse Representation Learning of Knowledge Graphs

Authors: Wen Zhang, Xiangnan Chen, Zhen Yao, Mingyang Chen, Yushan Zhu, Hongtao Yu, Yufeng Huang, Zezhong Xu, Yajing Xu, Ningyu Zhang, Zonggang Yuan, Feiyu Xiong, Huajun Chen

Abstract: NeuralKG is an open-source Python-based library for diverse representation learning of knowledge graphs. It implements three different series of Knowledge Graph Embedding (KGE) methods, including conventional KGEs, GNN-based KGEs, and Rule-based KGEs. With a unified framework, NeuralKG successfully reproduces link prediction results of these methods on benchmarks, freeing users from the laborious… ▽ More NeuralKG is an open-source Python-based library for diverse representation learning of knowledge graphs. It implements three different series of Knowledge Graph Embedding (KGE) methods, including conventional KGEs, GNN-based KGEs, and Rule-based KGEs. With a unified framework, NeuralKG successfully reproduces link prediction results of these methods on benchmarks, freeing users from the laborious task of reimplementing them, especially for some methods originally written in non-python programming languages. Besides, NeuralKG is highly configurable and extensible. It provides various decoupled modules that can be mixed and adapted to each other. Thus with NeuralKG, developers and researchers can quickly implement their own designed models and obtain the optimal training methods to achieve the best performance efficiently. We built an website in http://neuralkg.zjukg.cn to organize an open and shared KG representation learning community. The source code is all publicly released at https://github.com/zjukg/NeuralKG. △ Less

Submitted 25 February, 2022; originally announced February 2022.

Comments: work in progress

arXiv:2202.08610 [pdf, other]

doi 10.1103/PhysRevB.106.045424

Understanding the three-dimensional quantum Hall effect in generic multi-Weyl semimetals

Authors: Feng Xiong, Carsten Honerkamp, Dante M. Kennes, Tanay Nag

Abstract: The quantum Hall effect in three-dimensional Weyl semimetal (WSM) receives significant attention for the emergence of the Fermi loop where the underlying two-dimensional Hall conductivity, namely, sheet Hall conductivity, shows quantized plateaus. Considering the tilted lattice models for multi Weyl semimetals (mWSMs), we systematically study the Landau levels (LLs) and magneto-Hall conductivity i… ▽ More The quantum Hall effect in three-dimensional Weyl semimetal (WSM) receives significant attention for the emergence of the Fermi loop where the underlying two-dimensional Hall conductivity, namely, sheet Hall conductivity, shows quantized plateaus. Considering the tilted lattice models for multi Weyl semimetals (mWSMs), we systematically study the Landau levels (LLs) and magneto-Hall conductivity in the presence of parallel and perpendicular (with respect to the Weyl node's separation) magnetic field, i.e., $\mathbf{ B}\parallel z$ and $\mathbf{B}\parallel x$, to explore the impact of tilting and non-linearity in the dispersion. We make use of two (single) node low-energy models to qualitatively explain the emergence of mid-gap chiral (linear crossing of chiral) LLs on the lattice for $\mathbf{ B}\parallel z$ ($\mathbf{ B}\parallel x$). Remarkably, we find that the sheet Hall conductivity becomes quantized for $\mathbf{ B}\parallel z$ even when two Weyl nodes project onto a single Fermi point in two opposite surfaces, forming a Fermi loop with $k_z$ as the good quantum number. On the other hand, the Fermi loop, connecting two distinct Fermi points in two opposite surfaces, with $k_x$ being the good quantum number, causes the quantization in sheet Hall conductivity for $\mathbf{ B}\parallel x$. The quantization is almost lost (perfectly remained) in the type-II phase for $\mathbf{ B}\parallel x$ ($\mathbf{ B}\parallel z$). Interestingly, the jump profiles between the adjacent quantized plateaus change with the topological charge for both of the above cases. The momentum-integrated three-dimensional Hall conductivity is not quantized; however, it bears the signature of chiral LLs as resulting in the linear dependence on $μ$ for small $μ$. The linear zone (its slope) reduces (increases) as the tilt (topological charge) of the underlying WSM increases. △ Less

Submitted 31 July, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

Comments: 19 pages and 9 figures

Journal ref: Phys. Rev. B 106, 045424 (2022)

arXiv:2202.02113 [pdf, other]

doi 10.1145/3487553.3524238

From Discrimination to Generation: Knowledge Graph Completion with Generative Transformer

Authors: Xin Xie, Ningyu Zhang, Zhoubo Li, Shumin Deng, Hui Chen, Feiyu Xiong, Mosha Chen, Huajun Chen

Abstract: Knowledge graph completion aims to address the problem of extending a KG with missing triples. In this paper, we provide an approach GenKGC, which converts knowledge graph completion to sequence-to-sequence generation task with the pre-trained language model. We further introduce relation-guided demonstration and entity-aware hierarchical decoding for better representation learning and fast infere… ▽ More Knowledge graph completion aims to address the problem of extending a KG with missing triples. In this paper, we provide an approach GenKGC, which converts knowledge graph completion to sequence-to-sequence generation task with the pre-trained language model. We further introduce relation-guided demonstration and entity-aware hierarchical decoding for better representation learning and fast inference. Experimental results on three datasets show that our approach can obtain better or comparable performance than baselines and achieve faster inference speed compared with previous methods with pre-trained language models. We also release a new large-scale Chinese knowledge graph dataset AliopenKG500 for research purpose. Code and datasets are available in https://github.com/zjunlp/PromptKG/tree/main/GenKGC. △ Less

Submitted 14 March, 2023; v1 submitted 4 February, 2022; originally announced February 2022.

Comments: Accepted by WWW 2022 Poster

arXiv:2201.12673 [pdf, other]

doi 10.1109/JETCAS.2023.3330832

Building time-surfaces by exploiting the complex volatility of an ECRAM memristor

Authors: Marco Rasetto, Qingzhou Wan, Himanshu Akolkar, Feng Xiong, Bertram Shi, Ryad Benosman

Abstract: Memristors have emerged as a promising technology for efficient neuromorphic architectures owing to their ability to act as programmable synapses, combining processing and memory into a single device. Although they are most commonly used for static encoding of synaptic weights, recent work has begun to investigate the use of their dynamical properties, such as Short Term Plasticity (STP), to integ… ▽ More Memristors have emerged as a promising technology for efficient neuromorphic architectures owing to their ability to act as programmable synapses, combining processing and memory into a single device. Although they are most commonly used for static encoding of synaptic weights, recent work has begun to investigate the use of their dynamical properties, such as Short Term Plasticity (STP), to integrate events over time in event-based architectures. However, we are still far from completely understanding the range of possible behaviors and how they might be exploited in neuromorphic computation. This work focuses on a newly developed Li$_\textbf{x}$WO$_\textbf{3}$-based three-terminal memristor that exhibits tunable STP and a conductance response modeled by a double exponential decay. We derive a stochastic model of the device from experimental data and investigate how device stochasticity, STP, and the double exponential decay affect accuracy in a hierarchy of time-surfaces (HOTS) architecture. We found that the device's stochasticity does not affect accuracy, that STP can reduce the effect of salt and pepper noise in signals from event-based sensors, and that the double exponential decay improves accuracy by integrating temporal information over multiple time scales. Our approach can be generalized to study other memristive devices to build a better understanding of how control over temporal dynamics can enable neuromorphic engineers to fine-tune devices and architectures to fit their problems at hand. △ Less

Submitted 15 April, 2024; v1 submitted 29 January, 2022; originally announced January 2022.

arXiv:2201.11332 [pdf, other]

doi 10.1145/3485447.3511921

Ontology-enhanced Prompt-tuning for Few-shot Learning

Authors: Hongbin Ye, Ningyu Zhang, Shumin Deng, Xiang Chen, Hui Chen, Feiyu Xiong, Xi Chen, Huajun Chen

Abstract: Few-shot Learning (FSL) is aimed to make predictions based on a limited number of samples. Structured data such as knowledge graphs and ontology libraries has been leveraged to benefit the few-shot setting in various tasks. However, the priors adopted by the existing methods suffer from challenging knowledge missing, knowledge noise, and knowledge heterogeneity, which hinder the performance for fe… ▽ More Few-shot Learning (FSL) is aimed to make predictions based on a limited number of samples. Structured data such as knowledge graphs and ontology libraries has been leveraged to benefit the few-shot setting in various tasks. However, the priors adopted by the existing methods suffer from challenging knowledge missing, knowledge noise, and knowledge heterogeneity, which hinder the performance for few-shot learning. In this study, we explore knowledge injection for FSL with pre-trained language models and propose ontology-enhanced prompt-tuning (OntoPrompt). Specifically, we develop the ontology transformation based on the external knowledge graph to address the knowledge missing issue, which fulfills and converts structure knowledge to text. We further introduce span-sensitive knowledge injection via a visible matrix to select informative knowledge to handle the knowledge noise issue. To bridge the gap between knowledge and text, we propose a collective training algorithm to optimize representations jointly. We evaluate our proposed OntoPrompt in three tasks, including relation extraction, event extraction, and knowledge graph completion, with eight datasets. Experimental results demonstrate that our approach can obtain better few-shot performance than baselines. △ Less

Submitted 27 January, 2022; originally announced January 2022.

Comments: Accepted by WWW2022

arXiv:2201.06206 [pdf, other]

SQUIRE: A Sequence-to-sequence Framework for Multi-hop Knowledge Graph Reasoning

Authors: Yushi Bai, Xin Lv, Juanzi Li, Lei Hou, Yincen Qu, Zelin Dai, Feiyu Xiong

Abstract: Multi-hop knowledge graph (KG) reasoning has been widely studied in recent years to provide interpretable predictions on missing links with evidential paths. Most previous works use reinforcement learning (RL) based methods that learn to navigate the path towards the target entity. However, these methods suffer from slow and poor convergence, and they may fail to infer a certain path when there is… ▽ More Multi-hop knowledge graph (KG) reasoning has been widely studied in recent years to provide interpretable predictions on missing links with evidential paths. Most previous works use reinforcement learning (RL) based methods that learn to navigate the path towards the target entity. However, these methods suffer from slow and poor convergence, and they may fail to infer a certain path when there is a missing edge along the path. Here we present SQUIRE, the first Sequence-to-sequence based multi-hop reasoning framework, which utilizes an encoder-decoder Transformer structure to translate the query to a path. Our framework brings about two benefits: (1) It can learn and predict in an end-to-end fashion, which gives better and faster convergence; (2) Our Transformer model does not rely on existing edges to generate the path, and has the flexibility to complete missing edges along the path, especially in sparse KGs. Experiments on standard and sparse KGs show that our approach yields significant improvement over prior methods, while converging 4x-7x faster. △ Less

Submitted 31 October, 2022; v1 submitted 16 January, 2022; originally announced January 2022.

Comments: EMNLP 2022. Code is available at https://github.com/bys0318/SQUIRE

arXiv:2201.03335 [pdf, other]

DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

Authors: Ningyu Zhang, Xin Xu, Liankuan Tao, Haiyang Yu, Hongbin Ye, Shuofei Qiao, Xin Xie, Xiang Chen, Zhoubo Li, Lei Li, Xiaozhuan Liang, Yunzhi Yao, Shumin Deng, Peng Wang, Wen Zhang, Zhenru Zhang, Chuanqi Tan, Qiang Chen, Feiyu Xiong, Fei Huang, Guozhou Zheng, Huajun Chen

Abstract: We present an open-source and extensible knowledge extraction toolkit DeepKE, supporting complicated low-resource, document-level and multimodal scenarios in the knowledge base population. DeepKE implements various information extraction tasks, including named entity recognition, relation extraction and attribute extraction. With a unified framework, DeepKE allows developers and researchers to cus… ▽ More We present an open-source and extensible knowledge extraction toolkit DeepKE, supporting complicated low-resource, document-level and multimodal scenarios in the knowledge base population. DeepKE implements various information extraction tasks, including named entity recognition, relation extraction and attribute extraction. With a unified framework, DeepKE allows developers and researchers to customize datasets and models to extract information from unstructured data according to their requirements. Specifically, DeepKE not only provides various functional modules and model implementation for different tasks and scenarios but also organizes all components by consistent frameworks to maintain sufficient modularity and extensibility. We release the source code at GitHub in https://github.com/zjunlp/DeepKE with Google Colab tutorials and comprehensive documents for beginners. Besides, we present an online system in http://deepke.openkg.cn/EN/re_doc_show.html for real-time extraction of various tasks, and a demo video. △ Less

Submitted 18 September, 2023; v1 submitted 10 January, 2022; originally announced January 2022.

Comments: Accepted by EMNLP 2022 System Demonstrations and the project website is http://deepke.zjukg.cn/

arXiv:2112.08589 [pdf, other]

Knowledge Graph Embedding in E-commerce Applications: Attentive Reasoning, Explanations, and Transferable Rules

Authors: Wen Zhang, Shumin Deng, Mingyang Chen, Liang Wang, Qiang Chen, Feiyu Xiong, Xiangwen Liu, Huajun Chen

Abstract: Knowledge Graphs (KGs), representing facts as triples, have been widely adopted in many applications. Reasoning tasks such as link prediction and rule induction are important for the development of KGs. Knowledge Graph Embeddings (KGEs) embedding entities and relations of a KG into continuous vector spaces, have been proposed for these reasoning tasks and proven to be efficient and robust. But the… ▽ More Knowledge Graphs (KGs), representing facts as triples, have been widely adopted in many applications. Reasoning tasks such as link prediction and rule induction are important for the development of KGs. Knowledge Graph Embeddings (KGEs) embedding entities and relations of a KG into continuous vector spaces, have been proposed for these reasoning tasks and proven to be efficient and robust. But the plausibility and feasibility of applying and deploying KGEs in real-work applications has not been well-explored. In this paper, we discuss and report our experiences of deploying KGEs in a real domain application: e-commerce. We first identity three important desiderata for e-commerce KG systems: 1) attentive reasoning, reasoning over a few target relations of more concerns instead of all; 2) explanation, providing explanations for a prediction to help both users and business operators understand why the prediction is made; 3) transferable rules, generating reusable rules to accelerate the deployment of a KG to new systems. While non existing KGE could meet all these desiderata, we propose a novel one, an explainable knowledge graph attention network that make prediction through modeling correlations between triples rather than purely relying on its head entity, relation and tail entity embeddings. It could automatically selects attentive triples for prediction and records the contribution of them at the same time, from which explanations could be easily provided and transferable rules could be efficiently produced. We empirically show that our method is capable of meeting all three desiderata in our e-commerce application and outperform typical baselines on datasets from real domain applications. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: Accepted at IJCKG2021

arXiv:2108.03989 [pdf, other]

Spatial-Temporal Deep Intention Destination Networks for Online Travel Planning

Authors: Yu Li, Fei Xiong, Ziyi Wang, Zulong Chen, Chuanfei Xu, Yuyu Yin, Li Zhou

Abstract: Nowadays, artificial neural networks are widely used for users' online travel planning. Personalized travel planning has many real applications and is affected by various factors, such as transportation type, intention destination estimation, budget limit and crowdness prediction. Among those factors, users' intention destination prediction is an essential task in online travel platforms. The reas… ▽ More Nowadays, artificial neural networks are widely used for users' online travel planning. Personalized travel planning has many real applications and is affected by various factors, such as transportation type, intention destination estimation, budget limit and crowdness prediction. Among those factors, users' intention destination prediction is an essential task in online travel platforms. The reason is that, the user may be interested in the travel plan only when the plan matches his real intention destination. Therefore, in this paper, we focus on predicting users' intention destinations in online travel platforms. In detail, we act as online travel platforms (such as Fliggy and Airbnb) to recommend travel plans for users, and the plan consists of various vacation items including hotel package, scenic packages and so on. Predicting the actual intention destination in travel planning is challenging. Firstly, users' intention destination is highly related to their travel status (e.g., planning for a trip or finishing a trip). Secondly, users' actions (e.g. clicking, searching) over different product types (e.g. train tickets, visa application) have different indications in destination prediction. Thirdly, users may mostly visit the travel platforms just before public holidays, and thus user behaviors in online travel platforms are more sparse, low-frequency and long-period. Therefore, we propose a Deep Multi-Sequences fused neural Networks (DMSN) to predict intention destinations from fused multi-behavior sequences. Real datasets are used to evaluate the performance of our proposed DMSN models. Experimental results indicate that the proposed DMSN models can achieve high intention destination prediction accuracy. △ Less

Submitted 9 August, 2021; originally announced August 2021.

arXiv:2106.09876 [pdf, other]

doi 10.1109/TKDE.2021.3124061

Anomaly Detection in Dynamic Graphs via Transformer

Authors: Yixin Liu, Shirui Pan, Yu Guang Wang, Fei Xiong, Liang Wang, Qingfeng Chen, Vincent CS Lee

Abstract: Detecting anomalies for dynamic graphs has drawn increasing attention due to their wide applications in social networks, e-commerce, and cybersecurity. Recent deep learning-based approaches have shown promising results over shallow methods. However, they fail to address two core challenges of anomaly detection in dynamic graphs: the lack of informative encoding for unattributed nodes and the diffi… ▽ More Detecting anomalies for dynamic graphs has drawn increasing attention due to their wide applications in social networks, e-commerce, and cybersecurity. Recent deep learning-based approaches have shown promising results over shallow methods. However, they fail to address two core challenges of anomaly detection in dynamic graphs: the lack of informative encoding for unattributed nodes and the difficulty of learning discriminate knowledge from coupled spatial-temporal dynamic graphs. To overcome these challenges, in this paper, we present a novel Transformer-based Anomaly Detection framework for DYnamic graphs (TADDY). Our framework constructs a comprehensive node encoding strategy to better represent each node's structural and temporal roles in an evolving graphs stream. Meanwhile, TADDY captures informative representation from dynamic graphs with coupled spatial-temporal patterns via a dynamic graph transformer model. The extensive experimental results demonstrate that our proposed TADDY framework outperforms the state-of-the-art methods by a large margin on six real-world datasets. △ Less

Submitted 27 October, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

Comments: 13 pages, 5 figures

arXiv:2106.03446 [pdf, other]

Controlling the dynamics of open quantum systems with periodic driving field

Authors: Fei-Lei Xiong, Wei-Min Zhang

Abstract: In this paper, we study the exact dynamics of open quantum systems to the case with periodic driving field. It is shown that different from the static adjustment of the system on-site energy that can either generate or destroy the dissipationless localized bound states, the periodic driving can either preserve the existed localized bound states or destroy some of them but cannot generate new local… ▽ More In this paper, we study the exact dynamics of open quantum systems to the case with periodic driving field. It is shown that different from the static adjustment of the system on-site energy that can either generate or destroy the dissipationless localized bound states, the periodic driving can either preserve the existed localized bound states or destroy some of them but cannot generate new localized bound states. With the picture of energy transfer involved with the driving field, we find the condition for the survival of the localized bound states when the driving amplitude is weak. For the strong driving case, the condition breaks down because of the strong energy renormalization to the originally existed localized bound states. These properties of decoherence dynamics may help in controlling the quantum state against decoherence for the sake of its sensitivity to the fundamental frequency of the driving field. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: 7 pages, 5 figures

arXiv:2105.13555 [pdf, other]

doi 10.1103/PhysRevApplied.16.L011003

Lens-free Optical Detection of Thermal Motion of a Sub-millimeter Sphere Diamagnetically Levitated in High Vacuum

Authors: Fang Xiong, Peiran Yin, Tong Wu, Han Xie, Rui Li, Yingchun Leng, Yanan Li, Changkui Duan, Xi Kong, Pu Huang, Jiangfeng Du

Abstract: Levitated oscillators with millimeter or sub-millimeter size are particularly attractive due to their potential role in studying various fundamental problems and practical applications. One of the crucial issues towards these goals is to achieve efficient measurements of oscillator motion, while this remains a challenge. Here we theoretically propose a lens-free optical detection scheme, which can… ▽ More Levitated oscillators with millimeter or sub-millimeter size are particularly attractive due to their potential role in studying various fundamental problems and practical applications. One of the crucial issues towards these goals is to achieve efficient measurements of oscillator motion, while this remains a challenge. Here we theoretically propose a lens-free optical detection scheme, which can be used to detect the motion of a millimeter or sub-millimeter levitated oscillator with a measurement efficiency close to the standard quantum limit with a modest optical power. We demonstrate experimentally this scheme on a 0.5 mm diameter micro-sphere that is diamagnetically levitated under high vacuum and room temperature, and the thermal motion is detected with high precision. Based on this system, an estimated acceleration sensitivity of $9.7 \times 10^{-10}\rm g/\sqrt{Hz}$ is achieved, which is more than one order improvement over the best value reported by the levitated mechanical system. Due to the stability of the system, the minimum resolved acceleration of $3.5\times 10^{-12}\rm g$ is reached with measurement times of $10^5$ s. This result is expected to have potential applications in the study of exotic interactions in the millimeter or sub-millimeter range and the realization of compact gravimeter and accelerometer. △ Less

Submitted 27 May, 2021; originally announced May 2021.

Comments: Physical Review Applied (to be published)

Journal ref: Phys. Rev. Applied 16, 011003 (2021)

arXiv:2105.05473 [pdf, other]

Interpretable performance analysis towards offline reinforcement learning: A dataset perspective

Authors: Chenyang Xi, Bo Tang, Jiajun Shen, Xinfu Liu, Feiyu Xiong, Xueying Li

Abstract: Offline reinforcement learning (RL) has increasingly become the focus of the artificial intelligent research due to its wide real-world applications where the collection of data may be difficult, time-consuming, or costly. In this paper, we first propose a two-fold taxonomy for existing offline RL algorithms from the perspective of exploration and exploitation tendency. Secondly, we derive the exp… ▽ More Offline reinforcement learning (RL) has increasingly become the focus of the artificial intelligent research due to its wide real-world applications where the collection of data may be difficult, time-consuming, or costly. In this paper, we first propose a two-fold taxonomy for existing offline RL algorithms from the perspective of exploration and exploitation tendency. Secondly, we derive the explicit expression of the upper bound of extrapolation error and explore the correlation between the performance of different types of algorithms and the distribution of actions under states. Specifically, we relax the strict assumption on the sufficiently large amount of state-action tuples. Accordingly, we provably explain why batch constrained Q-learning (BCQ) performs better than other existing techniques. Thirdly, after identifying the weakness of BCQ on dataset of low mean episode returns, we propose a modified variant based on top return selection mechanism, which is proved to be able to gain state-of-the-art performance on various datasets. Lastly, we create a benchmark platform on the Atari domain, entitled RL easy go (RLEG), at an estimated cost of more than 0.3 million dollars. We make it open-source for fair and comprehensive competitions between offline RL algorithms with complete datasets and checkpoints being provided. △ Less

Submitted 12 May, 2021; originally announced May 2021.

arXiv:2103.13038 [pdf, other]

doi 10.1103/PhysRevB.104.115151

Spin susceptibilities in magnetic type-I and type-II Weyl semimetals

Authors: Feng Xiong, Xingjie Han, Carsten Honerkamp

Abstract: We investigate interacting spin susceptibilities in lattice models for $\mathcal{T}$-reversal symmetry-broken Weyl semimetals. We employ a random phase approximation (RPA) method for the spin-SU(2)-symmetry-broken case that includes mixtures of ladder and bubble diagrams, beyond a SU(2)-symmetric case. Within this approach, the relations between the tendency towards magnetic order and the band str… ▽ More We investigate interacting spin susceptibilities in lattice models for $\mathcal{T}$-reversal symmetry-broken Weyl semimetals. We employ a random phase approximation (RPA) method for the spin-SU(2)-symmetry-broken case that includes mixtures of ladder and bubble diagrams, beyond a SU(2)-symmetric case. Within this approach, the relations between the tendency towards magnetic order and the band structure tilt parameter $γ$ under different temperatures are explored. The critical interaction strength $U_c$ for magnetic ordering decreases as the tilt term changes from type-I Weyl semimetals to type-II. The lower temperature, the sharper is the drop in $U_c$ at the critical point between them. The variation of $U_c$ with a slight doping near half-filling is also studied. It is generally found that these Weyl systems show a strongly anisotropic spin response with an enhanced doubly degenerate transverse susceptibility perpendicular to tilt direction, inherited from $\mathcal{C}_{4z}$ rational symmetry of bare Hamiltonian, but with the longitudinal response suppressed with respect to that. For small tilts $γ$ and strong enough interaction, we find two degenerate ordering patterns with spin order orthogonal to the tilt direction but much shorter spin correlation length parallel to the spin direction. With increasing the tilt, the system develops instabilities with respect to in-plane magnetic orders with wavevector $(0,π, q_z)$ and $(π,0, q_z)$, with $q_z$ increasing from 0 to $π$ before the transition to a type-II Weyl semimetal is reached. These results indicate a greater richness of magnetic phases in correlated Weyl semimetals that also pose challenges for precise theoretical descriptions. △ Less

Submitted 24 March, 2021; originally announced March 2021.

Comments: 10 pages, 8 figures

Journal ref: Phys. Rev. B 104, 115151 (2021)

arXiv:2102.10586 [pdf, other]

Generating Majorana qubit coherence in Majorana Aharonov-Bohm interferometer

Authors: Fei-Lei Xiong, Hon-Lam Lai, Wei-Min Zhang

Abstract: We propose an Aharonov-Bohm interferometer consisted of two topological superconducting chains (TSCs) to generate coherence of Majorana qubits, each qubit is made of two Majorana zero modes (MZMs) with the definite fermion parity. We obtain the generalized exact master equation as well as its solution and study the real-time dynamics of the MZM qubit states under various operations. We demonstrate… ▽ More We propose an Aharonov-Bohm interferometer consisted of two topological superconducting chains (TSCs) to generate coherence of Majorana qubits, each qubit is made of two Majorana zero modes (MZMs) with the definite fermion parity. We obtain the generalized exact master equation as well as its solution and study the real-time dynamics of the MZM qubit states under various operations. We demonstrate that by tuning the magnetic flux, the decoherence rates can be modified significantly, and dissipationless MZMs can be generated. By applying the bias voltage to the leads, one can manipulate MZM qubit coherence and generate a nearly pure superposition state of Majorana qubit. Moreover, parity flipping between MZM qubits with different fermion parities can be realized by controlling the coupling between the leads and the TSCs through gate voltages. △ Less

Submitted 21 February, 2021; originally announced February 2021.

Comments: 8 pages, 5 figures

Showing 1–50 of 104 results for author: Xiong, F