-
FuncEvalGMN: Evaluating Functional Correctness of SQL via Graph Matching Network
Authors:
Yi Zhan,
Yang Sun,
Han Weng,
Longjie Cui,
Guifeng Wang,
Jiajun Xie,
Yu Tian,
Xiaoming Yin,
Boyi Liu,
Dongchi Huang
Abstract:
In this paper, we propose a novel graph-based methodology to evaluate the functional correctness of SQL generation. Conventional metrics for assessing SQL code generation, such as matching-based and execution-based methods (e.g., exact set match and execution accuracy), are subject to two primary limitations. Firstly, the former fails to effectively assess functional correctness, as different SQL…
▽ More
In this paper, we propose a novel graph-based methodology to evaluate the functional correctness of SQL generation. Conventional metrics for assessing SQL code generation, such as matching-based and execution-based methods (e.g., exact set match and execution accuracy), are subject to two primary limitations. Firstly, the former fails to effectively assess functional correctness, as different SQL queries may possess identical functionalities. Secondly, the latter is susceptible to producing false positive samples in evaluations. Our proposed evaluation method, \texttt{FuncEvalGMN}, does not depend on the sufficient preparation of the test data, and it enables precise testing of the functional correctness of the code. Firstly, we parse SQL using a relational operator tree (ROT) called \textit{Relnode}, which contains rich semantic information from the perspective of logical execution.Then, we introduce a GNN-based approach for predicting the functional correctness of generated SQL. This approach incorporates global positional embeddings to address the limitations with the loss of topological information in conventional graph matching frameworks. As an auxiliary contribution, we propose a rule-based matching algorithm, Relnode Partial Matching (\texttt{RelPM}) as a baseline. Finally, we contribute a dataset, \texttt{Pair-Aug-Spider} with a training set and two testing sets, each comprising pairs of SQL codes to simulate various SQL code evaluation scenarios. The training set and one testing dataset focus on code generation using large language models (LLMs), while the other emphasizes SQL equivalence rewriting.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Physics-guided Active Sample Reweighting for Urban Flow Prediction
Authors:
Wei Jiang,
Tong Chen,
Guanhua Ye,
Wentao Zhang,
Lizhen Cui,
Zi Huang,
Hongzhi Yin
Abstract:
Urban flow prediction is a spatio-temporal modeling task that estimates the throughput of transportation services like buses, taxis, and ride-sharing, where data-driven models have become the most popular solution in the past decade. Meanwhile, the implicitly learned mapping between historical observations to the prediction targets tend to over-simplify the dynamics of real-world urban flows, lead…
▽ More
Urban flow prediction is a spatio-temporal modeling task that estimates the throughput of transportation services like buses, taxis, and ride-sharing, where data-driven models have become the most popular solution in the past decade. Meanwhile, the implicitly learned mapping between historical observations to the prediction targets tend to over-simplify the dynamics of real-world urban flows, leading to suboptimal predictions. Some recent spatio-temporal prediction solutions bring remedies with the notion of physics-guided machine learning (PGML), which describes spatio-temporal data with nuanced and principled physics laws, thus enhancing both the prediction accuracy and interpretability. However, these spatio-temporal PGML methods are built upon a strong assumption that the observed data fully conforms to the differential equations that define the physical system, which can quickly become ill-posed in urban flow prediction tasks. The observed urban flow data, especially when sliced into time-dependent snapshots to facilitate predictions, is typically incomplete and sparse, and prone to inherent noise incurred in the collection process. As a result, such physical inconsistency between the data and PGML model significantly limits the predictive power and robustness of the solution. Moreover, due to the interval-based predictions and intermittent nature of data filing in many transportation services, the instantaneous dynamics of urban flows can hardly be captured, rendering differential equation-based continuous modeling a loose fit for this setting. To overcome the challenges, we develop a discretized physics-guided network (PN), and propose a data-aware framework Physics-guided Active Sample Reweighting (P-GASR) to enhance PN. Experimental results in four real-world datasets demonstrate that our method achieves state-of-the-art performance with a demonstrable improvement in robustness.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Quantum Vicsek Model for Active Matter
Authors:
Hong Yuan,
L. X. Cui,
L. T. Chen,
C. P. Sun
Abstract:
We propose a quantum analog of the Vicsek model, consisting of an ensemble of overdamped spin$-1/2$ particles with ferromagnetic couplings, driven by a uniformly polarized magnetic field. The spontaneous magnetization of the spin components breaks the $SO(3)$ (or $SO(2)$) symmetry, inducing an ordered phase of flocking. We derive the hydrodynamic equations, similar to those formulated by Toner and…
▽ More
We propose a quantum analog of the Vicsek model, consisting of an ensemble of overdamped spin$-1/2$ particles with ferromagnetic couplings, driven by a uniformly polarized magnetic field. The spontaneous magnetization of the spin components breaks the $SO(3)$ (or $SO(2)$) symmetry, inducing an ordered phase of flocking. We derive the hydrodynamic equations, similar to those formulated by Toner and Tu, by applying a mean-field approximation to the quantum analog model up to the next leading order. Our investigation not only establishes a microscopic connection between the Vicsek model and the Toner-Tu hydrodynamics for active matter, but also aims to inspire further studies of active matter in the quantum regime.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Optimal radio labeling for the Cartesian product of square mesh networks and stars
Authors:
Linlin Cui,
Feng Li
Abstract:
As the most critical component in the communication process, channels have a great impact on the communication quality of network. With the continuous expansion of network scale, the limited channel resources lead to the limitation of communication network scale. Therefore, achieving reasonable channel assignment and utilization becomes an extremely challenging problem. In order to solve this issu…
▽ More
As the most critical component in the communication process, channels have a great impact on the communication quality of network. With the continuous expansion of network scale, the limited channel resources lead to the limitation of communication network scale. Therefore, achieving reasonable channel assignment and utilization becomes an extremely challenging problem. In order to solve this issue effectively, the channel assignment problem in communication networks can be transformed into a graph labeling problem, utilizing graphs to simulate the communication networks. In this paper, the topologies of mesh networks and stars are studied by constructing Cartesian product, and the lower bound and exact value of the optimal radio label of the Cartesian product of square mesh network and star $G=P(m,m)\Box K_{1,n}$ are obtained, where $m\geq 2$.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Multi-modal Food Recommendation using Clustering and Self-supervised Learning
Authors:
Yixin Zhang,
Xin Zhou,
Qianwen Meng,
Fanglin Zhu,
Yonghui Xu,
Zhiqi Shen,
Lizhen Cui
Abstract:
Food recommendation systems serve as pivotal components in the realm of digital lifestyle services, designed to assist users in discovering recipes and food items that resonate with their unique dietary predilections. Typically, multi-modal descriptions offer an exhaustive profile for each recipe, thereby ensuring recommendations that are both personalized and accurate. Our preliminary investigati…
▽ More
Food recommendation systems serve as pivotal components in the realm of digital lifestyle services, designed to assist users in discovering recipes and food items that resonate with their unique dietary predilections. Typically, multi-modal descriptions offer an exhaustive profile for each recipe, thereby ensuring recommendations that are both personalized and accurate. Our preliminary investigation of two datasets indicates that pre-trained multi-modal dense representations might precipitate a deterioration in performance compared to ID features when encapsulating interactive relationships. This observation implies that ID features possess a relative superiority in modeling interactive collaborative signals. Consequently, contemporary cutting-edge methodologies augment ID features with multi-modal information as supplementary features, overlooking the latent semantic relations between recipes. To rectify this, we present CLUSSL, a novel food recommendation framework that employs clustering and self-supervised learning. Specifically, CLUSSL formulates a modality-specific graph tailored to each modality with discrete/continuous features, thereby transforming semantic features into structural representation. Furthermore, CLUSSL procures recipe representations pertinent to different modalities via graph convolutional operations. A self-supervised learning objective is proposed to foster independence between recipe representations derived from different unimodal graphs. Comprehensive experiments on real-world datasets substantiate that CLUSSL consistently surpasses state-of-the-art recommendation benchmarks in performance.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems
Authors:
Hung Vinh Tran,
Tong Chen,
Quoc Viet Hung Nguyen,
Zi Huang,
Lizhen Cui,
Hongzhi Yin
Abstract:
Since the creation of the Web, recommender systems (RSs) have been an indispensable mechanism in information filtering. State-of-the-art RSs primarily depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables. To prevent over-parameterized embedding tables from harming scalability, both academia and industry have seen increasing efforts in c…
▽ More
Since the creation of the Web, recommender systems (RSs) have been an indispensable mechanism in information filtering. State-of-the-art RSs primarily depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables. To prevent over-parameterized embedding tables from harming scalability, both academia and industry have seen increasing efforts in compressing RS embeddings. However, despite the prosperity of lightweight embedding-based RSs (LERSs), a wide diversity is seen in evaluation protocols, resulting in obstacles when relating LERS performance to real-world usability. Moreover, despite the common goal of lightweight embeddings, LERSs are evaluated with a single choice between the two main recommendation tasks -- collaborative filtering and content-based recommendation. This lack of discussions on cross-task transferability hinders the development of unified, more scalable solutions. Motivated by these issues, this study investigates various LERSs' performance, efficiency, and cross-task transferability via a thorough benchmarking process. Additionally, we propose an efficient embedding compression method using magnitude pruning, which is an easy-to-deploy yet highly competitive baseline that outperforms various complex LERSs. Our study reveals the distinct performance of LERSs across the two tasks, shedding light on their effectiveness and generalizability. To support edge-based recommendations, we tested all LERSs on a Raspberry Pi 4, where the efficiency bottleneck is exposed. Finally, we conclude this paper with critical summaries of LERS performance, model selection suggestions, and underexplored challenges around LERSs for future research. To encourage future research, we publish source codes and artifacts at \href{this link}{https://github.com/chenxing1999/recsys-benchmark}.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning
Authors:
Sen Yang,
Leyang Cui,
Deng Cai,
Xinting Huang,
Shuming Shi,
Wai Lam
Abstract:
Iterative preference learning, though yielding superior performances, requires online annotated preference labels. In this work, we study strategies to select worth-annotating response pairs for cost-efficient annotation while achieving competitive or even better performances compared with the random selection baseline for iterative preference learning. Built on assumptions regarding uncertainty a…
▽ More
Iterative preference learning, though yielding superior performances, requires online annotated preference labels. In this work, we study strategies to select worth-annotating response pairs for cost-efficient annotation while achieving competitive or even better performances compared with the random selection baseline for iterative preference learning. Built on assumptions regarding uncertainty and distribution shifts, we propose a comparative view to rank the implicit reward margins as predicted by DPO to select the response pairs that yield more benefits. Through extensive experiments, we show that annotating those response pairs with small margins is generally better than large or random, under both single- and multi-iteration scenarios. Besides, our empirical results suggest allocating more annotation budgets in the earlier iterations rather than later across multiple iterations.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Authors:
Deng Cai,
Huayang Li,
Tingchen Fu,
Siheng Li,
Weiwen Xu,
Shuaiyi Li,
Bowen Cao,
Zhisong Zhang,
Xinting Huang,
Leyang Cui,
Yan Wang,
Lemao Liu,
Taro Watanabe,
Shuming Shi
Abstract:
Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation…
▽ More
Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation directions, each of which facilitates a variety of applications. Our work offers a holistic view that unifies numerous existing studies and suggests potential research directions. We envision our work as a useful roadmap for future research on LLMs.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Guiding LLM Temporal Logic Generation with Explicit Separation of Data and Control
Authors:
William Murphy,
Nikolaus Holzer,
Nathan Koenig,
Leyi Cui,
Raven Rothkopf,
Feitong Qiao,
Mark Santolucito
Abstract:
Temporal logics are powerful tools that are widely used for the synthesis and verification of reactive systems. The recent progress on Large Language Models (LLMs) has the potential to make the process of writing such specifications more accessible. However, writing specifications in temporal logics remains challenging for all but the most expert users. A key question in using LLMs for temporal lo…
▽ More
Temporal logics are powerful tools that are widely used for the synthesis and verification of reactive systems. The recent progress on Large Language Models (LLMs) has the potential to make the process of writing such specifications more accessible. However, writing specifications in temporal logics remains challenging for all but the most expert users. A key question in using LLMs for temporal logic specification engineering is to understand what kind of guidance is most helpful to the LLM and the users to easily produce specifications. Looking specifically at the problem of reactive program synthesis, we explore the impact of providing an LLM with guidance on the separation of control and data--making explicit for the LLM what functionality is relevant for the specification, and treating the remaining functionality as an implementation detail for a series of pre-defined functions and predicates. We present a benchmark set and find that this separation of concerns improves specification generation. Our benchmark provides a test set against which to verify future work in LLM generation of temporal logic specifications.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Inference Attacks: A Taxonomy, Survey, and Promising Directions
Authors:
Feng Wu,
Lei Cui,
Shaowen Yao,
Shui Yu
Abstract:
The prosperity of machine learning has also brought people's concerns about data privacy. Among them, inference attacks can implement privacy breaches in various MLaaS scenarios and model training/prediction phases. Specifically, inference attacks can perform privacy inference on undisclosed target training sets based on outputs of the target model, including but not limited to statistics, members…
▽ More
The prosperity of machine learning has also brought people's concerns about data privacy. Among them, inference attacks can implement privacy breaches in various MLaaS scenarios and model training/prediction phases. Specifically, inference attacks can perform privacy inference on undisclosed target training sets based on outputs of the target model, including but not limited to statistics, membership, semantics, data representation, etc. For instance, infer whether the target data has the characteristics of AIDS. In addition, the rapid development of the machine learning community in recent years, especially the surge of model types and application scenarios, has further stimulated the inference attacks' research. Thus, studying inference attacks and analyzing them in depth is urgent and significant. However, there is still a gap in the systematic discussion of inference attacks from taxonomy, global perspective, attack, and defense perspectives. This survey provides an in-depth and comprehensive inference of attacks and corresponding countermeasures in ML-as-a-service based on taxonomy and the latest researches. Without compromising researchers' intuition, we first propose the 3MP taxonomy based on the community research status, trying to normalize the confusing naming system of inference attacks. Also, we analyze the pros and cons of each type of inference attack, their workflow, countermeasure, and how they interact with other attacks. In the end, we point out several promising directions for researchers from a more comprehensive and novel perspective.
△ Less
Submitted 27 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
HC-GAE: The Hierarchical Cluster-based Graph Auto-Encoder for Graph Representation Learning
Authors:
Zhuo Xu,
Lu Bai,
Lixin Cui,
Ming Li,
Yue Wang,
Edwin R. Hancock
Abstract:
Graph Auto-Encoders (GAEs) are powerful tools for graph representation learning. In this paper, we develop a novel Hierarchical Cluster-based GAE (HC-GAE), that can learn effective structural characteristics for graph data analysis. To this end, during the encoding process, we commence by utilizing the hard node assignment to decompose a sample graph into a family of separated subgraphs. We compre…
▽ More
Graph Auto-Encoders (GAEs) are powerful tools for graph representation learning. In this paper, we develop a novel Hierarchical Cluster-based GAE (HC-GAE), that can learn effective structural characteristics for graph data analysis. To this end, during the encoding process, we commence by utilizing the hard node assignment to decompose a sample graph into a family of separated subgraphs. We compress each subgraph into a coarsened node, transforming the original graph into a coarsened graph. On the other hand, during the decoding process, we adopt the soft node assignment to reconstruct the original graph structure by expanding the coarsened nodes. By hierarchically performing the above compressing procedure during the decoding process as well as the expanding procedure during the decoding process, the proposed HC-GAE can effectively extract bidirectionally hierarchical structural features of the original sample graph. Furthermore, we re-design the loss function that can integrate the information from either the encoder or the decoder. Since the associated graph convolution operation of the proposed HC-GAE is restricted in each individual separated subgraph and cannot propagate the node information between different subgraphs, the proposed HC-GAE can significantly reduce the over-smoothing problem arising in the classical convolution-based GAEs. The proposed HC-GAE can generate effective representations for either node classification or graph classification, and the experiments demonstrate the effectiveness on real-world datasets.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text
Authors:
Yafu Li,
Zhilin Wang,
Leyang Cui,
Wei Bi,
Shuming Shi,
Yue Zhang
Abstract:
AI-generated text detection has attracted increasing attention as powerful language models approach human-level generation. Limited work is devoted to detecting (partially) AI-paraphrased texts. However, AI paraphrasing is commonly employed in various application scenarios for text refinement and diversity. To this end, we propose a novel detection framework, paraphrased text span detection (PTD),…
▽ More
AI-generated text detection has attracted increasing attention as powerful language models approach human-level generation. Limited work is devoted to detecting (partially) AI-paraphrased texts. However, AI paraphrasing is commonly employed in various application scenarios for text refinement and diversity. To this end, we propose a novel detection framework, paraphrased text span detection (PTD), aiming to identify paraphrased text spans within a text. Different from text-level detection, PTD takes in the full text and assigns each of the sentences with a score indicating the paraphrasing degree. We construct a dedicated dataset, PASTED, for paraphrased text span detection. Both in-distribution and out-of-distribution results demonstrate the effectiveness of PTD models in identifying AI-paraphrased text spans. Statistical and model analysis explains the crucial role of the surrounding context of the paraphrased text spans. Extensive experiments show that PTD models can generalize to versatile paraphrasing prompts and multiple paraphrased text spans. We release our resources at https://github.com/Linzwcs/PASTED.
△ Less
Submitted 29 May, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
ENADPool: The Edge-Node Attention-based Differentiable Pooling for Graph Neural Networks
Authors:
Zhehan Zhao,
Lu Bai,
Lixin Cui,
Ming Li,
Yue Wang,
Lixiang Xu,
Edwin R. Hancock
Abstract:
Graph Neural Networks (GNNs) are powerful tools for graph classification. One important operation for GNNs is the downsampling or pooling that can learn effective embeddings from the node representations. In this paper, we propose a new hierarchical pooling operation, namely the Edge-Node Attention-based Differentiable Pooling (ENADPool), for GNNs to learn effective graph representations. Unlike t…
▽ More
Graph Neural Networks (GNNs) are powerful tools for graph classification. One important operation for GNNs is the downsampling or pooling that can learn effective embeddings from the node representations. In this paper, we propose a new hierarchical pooling operation, namely the Edge-Node Attention-based Differentiable Pooling (ENADPool), for GNNs to learn effective graph representations. Unlike the classical hierarchical pooling operation that is based on the unclear node assignment and simply computes the averaged feature over the nodes of each cluster, the proposed ENADPool not only employs a hard clustering strategy to assign each node into an unique cluster, but also compress the node features as well as their edge connectivity strengths into the resulting hierarchical structure based on the attention mechanism after each pooling step. As a result, the proposed ENADPool simultaneously identifies the importance of different nodes within each separated cluster and edges between corresponding clusters, that significantly addresses the shortcomings of the uniform edge-node based structure information aggregation arising in the classical hierarchical pooling operation. Moreover, to mitigate the over-smoothing problem arising in existing GNNs, we propose a Multi-distance GNN (MD-GNN) model associated with the proposed ENADPool operation, allowing the nodes to actively and directly receive the feature information from neighbors at different random walk steps. Experiments demonstrate the effectiveness of the MD-GNN associated with the proposed ENADPool.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Phase Retrieval from the Hong-Ou-Mandel Dip to Characterize the Phase Spectrum of Independent Pulses at the Single-Photon Level
Authors:
Yuhang Lei,
Wen Zhao,
Liang Cui,
Xiaoying Li
Abstract:
Measuring the phase spectrum at the single-photon level is essential for the full characterization of the temporal-spectral mode of quantum sources. We present a phase retrieval algorithm-based method to recover the phase spectrum difference between two independent pulses from their Hong-Ou-Mandel interference pattern and intensity spectra. Our confirmatory experiment with coherent state pulses co…
▽ More
Measuring the phase spectrum at the single-photon level is essential for the full characterization of the temporal-spectral mode of quantum sources. We present a phase retrieval algorithm-based method to recover the phase spectrum difference between two independent pulses from their Hong-Ou-Mandel interference pattern and intensity spectra. Our confirmatory experiment with coherent state pulses confirms the accuracy of the recovered phase spectrum difference to within plus or minus 0.1 rad. The method we employ is readily generalizable to the measurement of single-photon wave packets and even correlated photon pairs.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning
Authors:
Wenqi Dong,
Bangbang Yang,
Lin Ma,
Xiao Liu,
Liyuan Cui,
Hujun Bao,
Yuewen Ma,
Zhaopeng Cui
Abstract:
As humans, we aspire to create media content that is both freely willed and readily controlled. Thanks to the prominent development of generative techniques, we now can easily utilize 2D diffusion methods to synthesize images controlled by raw sketch or designated human poses, and even progressively edit/regenerate local regions with masked inpainting. However, similar workflows in 3D modeling tas…
▽ More
As humans, we aspire to create media content that is both freely willed and readily controlled. Thanks to the prominent development of generative techniques, we now can easily utilize 2D diffusion methods to synthesize images controlled by raw sketch or designated human poses, and even progressively edit/regenerate local regions with masked inpainting. However, similar workflows in 3D modeling tasks are still unavailable due to the lack of controllability and efficiency in 3D generation. In this paper, we present a novel controllable and interactive 3D assets modeling framework, named Coin3D. Coin3D allows users to control the 3D generation using a coarse geometry proxy assembled from basic shapes, and introduces an interactive generation workflow to support seamless local part editing while delivering responsive 3D object previewing within a few seconds. To this end, we develop several techniques, including the 3D adapter that applies volumetric coarse shape control to the diffusion model, proxy-bounded editing strategy for precise part editing, progressive volume cache to support responsive preview, and volume-SDS to ensure consistent mesh reconstruction. Extensive experiments of interactive generation and editing on diverse shape proxies demonstrate that our method achieves superior controllability and flexibility in the 3D assets generation task.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Generation of Ultra-Collimated Polarized Attosecond $γ-$Rays via Beam Instabilities
Authors:
Li-Jie Cui,
Ke-Jia Wei,
Chong Lv,
Feng Wan,
Yousef I. Salamin,
Lei-Feng Cao,
Jian-Xing Li
Abstract:
Polarized attosecond $γ-$rays may offer excitation and hyperfine tracking of reactions relevant to nuclear physics, astrophysics, high-energy physics, etc. However, unfortunately, generation of a feasible and easy-to-deploy source is still a great challenge. Here, we put forward a novel method for producing ultra-collimated high-brilliance polarized attosecond $γ-$rays via the interaction of an un…
▽ More
Polarized attosecond $γ-$rays may offer excitation and hyperfine tracking of reactions relevant to nuclear physics, astrophysics, high-energy physics, etc. However, unfortunately, generation of a feasible and easy-to-deploy source is still a great challenge. Here, we put forward a novel method for producing ultra-collimated high-brilliance polarized attosecond $γ-$rays via the interaction of an unpolarized electron beam with a solid-density plasma. As a relativistic electron beam enters a solid-density plasma, it can be modulated into high-density clusters via the self-modulation instability of itself and further into attosecond slices due to its own hosing instability. This is accompanied by the generation of similar pulse-width $γ-$slices via nonlinear Compton scattering. The severe hosing instability breaks the symmetry of the excited electromagnetic fields, resulting in net linear polarization of $γ-$slices, which challenges the conventional perception that the interaction of an axially symmetric unpolarized electron beam with a uniform plasma cannot generate polarized radiation. In addition, we also obtain high-quality electron microbunches which may serve as an alternative source for prebunched free-electron lasers.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Very Long Baseline Array Observations of Parsec-scale Radio Emission in Dual Active Galactic Nuclei
Authors:
Wancheng Xu,
Lang Cui,
Xiang Liu,
Tao An,
Hongmin Cao,
Pengfei Jiang,
Luis C. Ho,
Ning Chang,
Xiaolong Yang,
Yuling Shen,
Guiping Tan,
Zhenhua Han,
Junhui Fan,
Ming Zhang
Abstract:
It is believed that dual active galactic nuclei (dual AGN) will form during galaxies merge. Studying dual-AGN emission can provide valuable insights into galaxy merging and evolution. To investigate parsec-scale radio emission properties, we observed eight radio components of four selected dual-AGN systems using the Very Long Baseline Array (VLBA) at 5 GHz in multiple-phase-center mode. Among them…
▽ More
It is believed that dual active galactic nuclei (dual AGN) will form during galaxies merge. Studying dual-AGN emission can provide valuable insights into galaxy merging and evolution. To investigate parsec-scale radio emission properties, we observed eight radio components of four selected dual-AGN systems using the Very Long Baseline Array (VLBA) at 5 GHz in multiple-phase-center mode. Among them, two compact radio components, labeled J0051+0020B and J2300-0005A, were detected clearly on parsec scales for the first time. However, the radio emission of the other six components was resolved out in the high-resolution images. We provided the values or upper limits of the brightness temperature and radio emission power, and analyzed the emission origins in detail for each target. Based on their physical properties reported in this work and in the literature, we suggest the radio emission in J0051+0020B and J2300-0005A originates primarily from compact jets, while the other six sources show more complex emission mechanisms. In addition, our VLBA observations suggest the systematic X-ray deficit in our dual-AGN sample is likely attributed to the tidally induced effect and possible viewing angle effect.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model
Authors:
Peijin Jia,
Tuopu Wen,
Ziang Luo,
Mengmeng Yang,
Kun Jiang,
Zhiquan Lei,
Xuewei Tang,
Ziyuan Liu,
Le Cui,
Kehua Sheng,
Bo Zhang,
Diange Yang
Abstract:
Constructing high-definition (HD) maps is a crucial requirement for enabling autonomous driving. In recent years, several map segmentation algorithms have been developed to address this need, leveraging advancements in Bird's-Eye View (BEV) perception. However, existing models still encounter challenges in producing realistic and consistent semantic map layouts. One prominent issue is the limited…
▽ More
Constructing high-definition (HD) maps is a crucial requirement for enabling autonomous driving. In recent years, several map segmentation algorithms have been developed to address this need, leveraging advancements in Bird's-Eye View (BEV) perception. However, existing models still encounter challenges in producing realistic and consistent semantic map layouts. One prominent issue is the limited utilization of structured priors inherent in map segmentation masks. In light of this, we propose DiffMap, a novel approach specifically designed to model the structured priors of map segmentation masks using latent diffusion model. By incorporating this technique, the performance of existing semantic segmentation methods can be significantly enhanced and certain structural errors present in the segmentation outputs can be effectively rectified. Notably, the proposed module can be seamlessly integrated into any map segmentation model, thereby augmenting its capability to accurately delineate semantic information. Furthermore, through extensive visualization analysis, our model demonstrates superior proficiency in generating results that more accurately reflect real-world map layouts, further validating its efficacy in improving the quality of the generated maps.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Magnetically Driven Relativistic Jet in the High-Redshift Blazar OH~471
Authors:
S. Guo,
T. An,
Y. Liu,
Y. Sotnikova,
A. Volvach,
T. Mufakharov,
L. Chen,
L. Cui,
A. Wang,
Z. Xu,
Y. Zhang,
W. Xu,
Y. A. Kovalev,
Y. Y. Kovalev,
M. Kharinov,
A. Erkenov,
T. Semenova,
L. Volvach
Abstract:
Context : Understanding the mechanisms that launch and shape powerful relativistic jets from supermassive black holes (SMBHs) in high-redshift active galactic nuclei (AGN) is crucial for probing the co-evolution of SMBHs and galaxies over cosmic time.
Aims :We study the high-redshift ($z=3.396$) blazar OH~471 to explore the jet launching mechanism in the early Universe.
Methods : Using multi-f…
▽ More
Context : Understanding the mechanisms that launch and shape powerful relativistic jets from supermassive black holes (SMBHs) in high-redshift active galactic nuclei (AGN) is crucial for probing the co-evolution of SMBHs and galaxies over cosmic time.
Aims :We study the high-redshift ($z=3.396$) blazar OH~471 to explore the jet launching mechanism in the early Universe.
Methods : Using multi-frequency radio monitoring observations and high-resolution Very Long Baseline Interferometry imaging over three decades, we study the milliarcsecond structure and long-term variability of OH~471.
Results : Spectral modelling of the radio flux densities reveals a synchrotron self-absorbed spectrum indicating strong magnetic fields within the compact core. By applying the flux freezing approximation, we estimate the magnetic flux carried by the jet and find that it reaches or exceeds theoretical predictions for jets powered by black hole spin energy via the Blandford-Znajek mechanism. This implies that OH~471 was in a magnetically arrested disk (MAD) state where the magnetic flux accumulated near the horizon regulates the accretion flow, allowing efficient extraction of black hole rotational energy.
Conclusions : Our study demonstrates the dominance of MAD accretion in powering the prominent radio flares and relativistic jets observed in the radio-loud AGN OH~471 and statistical studies of large samples of high-redshift AGN will shed light on the role of MAD accretion in launching and accelerating the earliest relativistic jets.
△ Less
Submitted 20 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Tests of the Kerr Hypothesis with MAXI J1803-298 Using Different RELXILL_NK Flavors
Authors:
Jie Liao,
M. Ghasemi-Nodehi,
Lang Cui,
Ashutosh Tripathi,
Yong-Feng Huang,
Xiang Liu
Abstract:
Iron line spectroscopy has been one of the leading methods not only for measuring the spins of accreting black holes but also for testing fundamental physics. Basing on such a method, we present an analysis of a dataset observed simultaneously by NuSTAR and NICER for the black hole binary candidate MAXI J1803-298, which shows prominent relativistic reflection features. Various relxill_nk flavors a…
▽ More
Iron line spectroscopy has been one of the leading methods not only for measuring the spins of accreting black holes but also for testing fundamental physics. Basing on such a method, we present an analysis of a dataset observed simultaneously by NuSTAR and NICER for the black hole binary candidate MAXI J1803-298, which shows prominent relativistic reflection features. Various relxill_nk flavors are utilized to test the Kerr black hole hypothesis. The results obtained from our analysis provide stringent constraints on Johannsen deformation parameter $α_{13}$ with the highest precise to date, namely $α_{13}=0.023^{+0.071}_{-0.038}$ from relxillD_nk and $α_{13}=0.006^{+0.045}_{-0.022}$ from relxillion_nk respectively in 3-$σ$ credible lever, where Johannsen metric reduces to Kerr metric when $α_{13}$ vanishes. Furthermore, we investigate the best model-fit results using Akaike Information Criterion and assess its systematic uncertainties.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
Authors:
Wenshan Wu,
Shaoguang Mao,
Yadong Zhang,
Yan Xia,
Li Dong,
Lei Cui,
Furu Wei
Abstract:
Large language models (LLMs) have exhibited impressive performance in language comprehension and various reasoning tasks. However, their abilities in spatial reasoning, a crucial aspect of human cognition, remain relatively unexplored. Human possess a remarkable ability to create mental images of unseen objects and actions through a process known as the Mind's Eye, enabling the imagination of the…
▽ More
Large language models (LLMs) have exhibited impressive performance in language comprehension and various reasoning tasks. However, their abilities in spatial reasoning, a crucial aspect of human cognition, remain relatively unexplored. Human possess a remarkable ability to create mental images of unseen objects and actions through a process known as the Mind's Eye, enabling the imagination of the unseen world. Inspired by this cognitive capacity, we propose Visualization-of-Thought (VoT) prompting. VoT aims to elicit spatial reasoning of LLMs by visualizing their reasoning traces, thereby guiding subsequent reasoning steps. We employed VoT for multi-hop spatial reasoning tasks, including natural language navigation, visual navigation, and visual tiling in 2D grid worlds. Experimental results demonstrated that VoT significantly enhances the spatial reasoning abilities of LLMs. Notably, VoT outperformed existing multimodal large language models (MLLMs) in these tasks. While VoT works surprisingly well on LLMs, the ability to generate mental images to facilitate spatial reasoning resembles the mind's eye process, suggesting its potential viability in MLLMs.
△ Less
Submitted 24 May, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
A Change of Scenery: Transformative Insights from Retrospective VR Embodied Perspective-Taking of Conflict With a Close Other
Authors:
Seraphina Yong,
Leo Cui,
Evan Suma Rosenberg,
Svetlana Yarosh
Abstract:
Close relationships are irreplaceable social resources, yet prone to high-risk conflict. Building on findings from the fields of HCI, virtual reality, and behavioral therapy, we evaluate the unexplored potential of retrospective VR-embodied perspective-taking to fundamentally influence conflict resolution in close others. We develop a biographically-accurate Retrospective Embodied Perspective-Taki…
▽ More
Close relationships are irreplaceable social resources, yet prone to high-risk conflict. Building on findings from the fields of HCI, virtual reality, and behavioral therapy, we evaluate the unexplored potential of retrospective VR-embodied perspective-taking to fundamentally influence conflict resolution in close others. We develop a biographically-accurate Retrospective Embodied Perspective-Taking system (REPT) and conduct a mixed-methods evaluation of its influence on close others' reflection and communication, compared to video-based reflection methods currently used in therapy (treatment as usual, or TAU). Our key findings provide evidence that REPT was able to significantly improve communication skills and positive sentiment of both partners during conflict, over TAU. The qualitative data also indicated that REPT surpassed basic perspective-taking by exclusively stimulating users to embody and reflect on both their own and their partner's experiences at the same level. In light of these findings, we provide implications and an agenda for social embodiment in HCI design: conceptualizing the use of `embodied social cognition,' and envisioning socially-embodied experiences as an interactive context.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Lightweight Embeddings for Graph Collaborative Filtering
Authors:
Xurong Liang,
Tong Chen,
Lizhen Cui,
Yang Wang,
Meng Wang,
Hongzhi Yin
Abstract:
Graph neural networks (GNNs) are currently one of the most performant collaborative filtering methods. Meanwhile, owing to the use of an embedding table to represent each user/item as a distinct vector, GNN-based recommenders have inherited the long-standing defect of parameter inefficiency. As a common practice for scalable embeddings, parameter sharing enables the use of fewer embedding vectors…
▽ More
Graph neural networks (GNNs) are currently one of the most performant collaborative filtering methods. Meanwhile, owing to the use of an embedding table to represent each user/item as a distinct vector, GNN-based recommenders have inherited the long-standing defect of parameter inefficiency. As a common practice for scalable embeddings, parameter sharing enables the use of fewer embedding vectors (i.e., meta-embeddings). When assigning meta-embeddings, most existing methods are a heuristically designed, predefined mapping from each user's/item's ID to the corresponding meta-embedding indexes, thus simplifying the optimization problem into learning only the meta-embeddings. However, in the context of GNN-based collaborative filtering, such a fixed mapping omits the semantic correlations between entities that are evident in the user-item interaction graph, leading to suboptimal recommendation performance. To this end, we propose Lightweight Embeddings for Graph Collaborative Filtering (LEGCF), a parameter-efficient embedding framework dedicated to GNN-based recommenders. LEGCF innovatively introduces an assignment matrix as an extra learnable component on top of meta-embeddings. To jointly optimize these two heavily entangled components, aside from learning the meta-embeddings by minimizing the recommendation loss, LEGCF further performs efficient assignment update by enforcing a novel semantic similarity constraint and finding its closed-form solution based on matrix pseudo-inverse. The meta-embeddings and assignment matrix are alternately updated, where the latter is sparsified on the fly to ensure negligible storage overhead. Extensive experiments on three benchmark datasets have verified LEGCF's smallest trade-off between size and performance, with consistent accuracy gain over state-of-the-art baselines. The codebase of LEGCF is available in https://github.com/xurong-liang/LEGCF.
△ Less
Submitted 28 March, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Exploring the Deceptive Power of LLM-Generated Fake News: A Study of Real-World Detection Challenges
Authors:
Yanshen Sun,
Jianfeng He,
Limeng Cui,
Shuo Lei,
Chang-Tien Lu
Abstract:
Recent advancements in Large Language Models (LLMs) have enabled the creation of fake news, particularly in complex fields like healthcare. Studies highlight the gap in the deceptive power of LLM-generated fake news with and without human assistance, yet the potential of prompting techniques has not been fully explored. Thus, this work aims to determine whether prompting strategies can effectively…
▽ More
Recent advancements in Large Language Models (LLMs) have enabled the creation of fake news, particularly in complex fields like healthcare. Studies highlight the gap in the deceptive power of LLM-generated fake news with and without human assistance, yet the potential of prompting techniques has not been fully explored. Thus, this work aims to determine whether prompting strategies can effectively narrow this gap. Current LLM-based fake news attacks require human intervention for information gathering and often miss details and fail to maintain context consistency. Therefore, to better understand threat tactics, we propose a strong fake news attack method called conditional Variational-autoencoder-Like Prompt (VLPrompt). Unlike current methods, VLPrompt eliminates the need for additional data collection while maintaining contextual coherence and preserving the intricacies of the original text. To propel future research on detecting VLPrompt attacks, we created a new dataset named VLPrompt fake news (VLPFN) containing real and fake texts. Our experiments, including various detection methods and novel human study metrics, were conducted to assess their performance on our dataset, yielding numerous findings.
△ Less
Submitted 8 April, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
CodeS: Natural Language to Code Repository via Multi-Layer Sketch
Authors:
Daoguang Zan,
Ailun Yu,
Wei Liu,
Dong Chen,
Bo Shen,
Wei Li,
Yafen Yao,
Yongshun Gong,
Xiaolin Chen,
Bei Guan,
Zhiguang Yang,
Yongji Wang,
Qianxiang Wang,
Lizhen Cui
Abstract:
The impressive performance of large language models (LLMs) on code-related tasks has shown the potential of fully automated software development. In light of this, we introduce a new software engineering task, namely Natural Language to code Repository (NL2Repo). This task aims to generate an entire code repository from its natural language requirements. To address this task, we propose a simple y…
▽ More
The impressive performance of large language models (LLMs) on code-related tasks has shown the potential of fully automated software development. In light of this, we introduce a new software engineering task, namely Natural Language to code Repository (NL2Repo). This task aims to generate an entire code repository from its natural language requirements. To address this task, we propose a simple yet effective framework CodeS, which decomposes NL2Repo into multiple sub-tasks by a multi-layer sketch. Specifically, CodeS includes three modules: RepoSketcher, FileSketcher, and SketchFiller. RepoSketcher first generates a repository's directory structure for given requirements; FileSketcher then generates a file sketch for each file in the generated structure; SketchFiller finally fills in the details for each function in the generated file sketch. To rigorously assess CodeS on the NL2Repo task, we carry out evaluations through both automated benchmarking and manual feedback analysis. For benchmark-based evaluation, we craft a repository-oriented benchmark, SketchEval, and design an evaluation metric, SketchBLEU. For feedback-based evaluation, we develop a VSCode plugin for CodeS and engage 30 participants in conducting empirical studies. Extensive experiments prove the effectiveness and practicality of CodeS on the NL2Repo task.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Dual-modal Prior Semantic Guided Infrared and Visible Image Fusion for Intelligent Transportation System
Authors:
Jing Li,
Lu Bai,
Bin Yang,
Chang Li,
Lingfei Ma,
Lixin Cui,
Edwin R. Hancock
Abstract:
Infrared and visible image fusion (IVF) plays an important role in intelligent transportation system (ITS). The early works predominantly focus on boosting the visual appeal of the fused result, and only several recent approaches have tried to combine the high-level vision task with IVF. However, they prioritize the design of cascaded structure to seek unified suitable features and fit different t…
▽ More
Infrared and visible image fusion (IVF) plays an important role in intelligent transportation system (ITS). The early works predominantly focus on boosting the visual appeal of the fused result, and only several recent approaches have tried to combine the high-level vision task with IVF. However, they prioritize the design of cascaded structure to seek unified suitable features and fit different tasks. Thus, they tend to typically bias toward to reconstructing raw pixels without considering the significance of semantic features. Therefore, we propose a novel prior semantic guided image fusion method based on the dual-modality strategy, improving the performance of IVF in ITS. Specifically, to explore the independent significant semantic of each modality, we first design two parallel semantic segmentation branches with a refined feature adaptive-modulation (RFaM) mechanism. RFaM can perceive the features that are semantically distinct enough in each semantic segmentation branch. Then, two pilot experiments based on the two branches are conducted to capture the significant prior semantic of two images, which then is applied to guide the fusion task in the integration of semantic segmentation branches and fusion branches. In addition, to aggregate both high-level semantics and impressive visual effects, we further investigate the frequency response of the prior semantics, and propose a multi-level representation-adaptive fusion (MRaF) module to explicitly integrate the low-frequent prior semantic with the high-frequent details. Extensive experiments on two public datasets demonstrate the superiority of our method over the state-of-the-art image fusion approaches, in terms of either the visual appeal or the high-level semantics.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
SSHPool: The Separated Subgraph-based Hierarchical Pooling
Authors:
Zhuo Xu,
Lixin Cui,
Yue Wang,
Hangyuan Du,
Lu Bai,
Edwin R. Hancock
Abstract:
In this paper, we develop a novel local graph pooling method, namely the Separated Subgraph-based Hierarchical Pooling (SSHPool), for graph classification. To this end, we commence by assigning the nodes of a sample graph into different clusters, resulting in a family of separated subgraphs. We individually employ a local graph convolution units as the local structure to further compress each subg…
▽ More
In this paper, we develop a novel local graph pooling method, namely the Separated Subgraph-based Hierarchical Pooling (SSHPool), for graph classification. To this end, we commence by assigning the nodes of a sample graph into different clusters, resulting in a family of separated subgraphs. We individually employ a local graph convolution units as the local structure to further compress each subgraph into a coarsened node, transforming the original graph into a coarsened graph. Since these subgraphs are separated by different clusters and the structural information cannot be propagated between them, the local convolution operation can significantly avoid the over-smoothing problem arising in most existing Graph Neural Networks (GNNs). By hierarchically performing the proposed procedures on the resulting coarsened graph, the proposed SSHPool can effectively extract the hierarchical global feature of the original graph structure, encapsulating rich intrinsic structural characteristics. Furthermore, we develop an end-to-end GNN framework associated with the proposed SSHPool module for graph classification. Experimental results demonstrate the superior performance of the proposed model on real-world datasets, significantly outperforming state-of-the-art GNN methods in terms of the classification accuracies.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
AKBR: Learning Adaptive Kernel-based Representations for Graph Classification
Authors:
Feifei Qian,
Lixin Cui,
Yue Wang,
Hangyuan Du,
Lu Bai,
Edwin R. Hancock
Abstract:
In this paper, we propose a new model to learn Adaptive Kernel-based Representations (AKBR) for graph classification. Unlike state-of-the-art R-convolution graph kernels that are defined by merely counting any pair of isomorphic substructures between graphs and cannot provide an end-to-end learning mechanism for the classifier, the proposed AKBR approach aims to define an end-to-end representation…
▽ More
In this paper, we propose a new model to learn Adaptive Kernel-based Representations (AKBR) for graph classification. Unlike state-of-the-art R-convolution graph kernels that are defined by merely counting any pair of isomorphic substructures between graphs and cannot provide an end-to-end learning mechanism for the classifier, the proposed AKBR approach aims to define an end-to-end representation learning model to construct an adaptive kernel matrix for graphs. To this end, we commence by leveraging a novel feature-channel attention mechanism to capture the interdependencies between different substructure invariants of original graphs. The proposed AKBR model can thus effectively identify the structural importance of different substructures, and compute the R-convolution kernel between pairwise graphs associated with the more significant substructures specified by their structural attentions. Since each row of the resulting kernel matrix can be theoretically seen as the embedding vector of a sample graph, the proposed AKBR model is able to directly employ the resulting kernel matrix as the graph feature matrix and input it into the classifier for classification (i.e., the SoftMax layer), naturally providing an end-to-end learning architecture between the kernel computation as well as the classifier. Experimental results show that the proposed AKBR model outperforms existing state-of-the-art graph kernels and deep learning methods on standard graph benchmarks.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Hierarchical Query Classification in E-commerce Search
Authors:
Bing He,
Sreyashi Nag,
Limeng Cui,
Suhang Wang,
Zheng Li,
Rahul Goutam,
Zhen Li,
Haiyang Zhang
Abstract:
E-commerce platforms typically store and structure product information and search data in a hierarchy. Efficiently categorizing user search queries into a similar hierarchical structure is paramount in enhancing user experience on e-commerce platforms as well as news curation and academic research. The significance of this task is amplified when dealing with sensitive query categorization or criti…
▽ More
E-commerce platforms typically store and structure product information and search data in a hierarchy. Efficiently categorizing user search queries into a similar hierarchical structure is paramount in enhancing user experience on e-commerce platforms as well as news curation and academic research. The significance of this task is amplified when dealing with sensitive query categorization or critical information dissemination, where inaccuracies can lead to considerable negative impacts. The inherent complexity of hierarchical query classification is compounded by two primary challenges: (1) the pronounced class imbalance that skews towards dominant categories, and (2) the inherent brevity and ambiguity of search queries that hinder accurate classification.
To address these challenges, we introduce a novel framework that leverages hierarchical information through (i) enhanced representation learning that utilizes the contrastive loss to discern fine-grained instance relationships within the hierarchy, called ''instance hierarchy'', and (ii) a nuanced hierarchical classification loss that attends to the intrinsic label taxonomy, named ''label hierarchy''. Additionally, based on our observation that certain unlabeled queries share typographical similarities with labeled queries, we propose a neighborhood-aware sampling technique to intelligently select these unlabeled queries to boost the classification performance. Extensive experiments demonstrate that our proposed method is better than state-of-the-art (SOTA) on the proprietary Amazon dataset, and comparable to SOTA on the public datasets of Web of Science and RCV1-V2. These results underscore the efficacy of our proposed solution, and pave the path toward the next generation of hierarchy-aware query classification systems.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
VLBI Astrometry of Radio Stars to Link Radio and Optical Celestial Reference Frames: Observing Strategies
Authors:
Jingdong Zhang,
Bo Zhang,
Shuangjing Xu,
Niu Liu,
Wen Chen,
Hao Ding,
Pengfei Jiang,
Yan Sun,
Jinqing Wang,
Lang Cui,
Shiming Wen,
Xiaofeng Mai,
Jinling Li,
Fengchun Shu,
Yidan Huang
Abstract:
The Gaia celestial reference frame (Gaia-CRF) will benefit from a close assessment with independent methods, such as Very Long Baseline Interferometry (VLBI) measurements of radio stars at bright magnitudes. However, obtaining full astrometric parameters for each radio star through VLBI measurements demands a significant amount of observation time. This study proposes an efficient observing strate…
▽ More
The Gaia celestial reference frame (Gaia-CRF) will benefit from a close assessment with independent methods, such as Very Long Baseline Interferometry (VLBI) measurements of radio stars at bright magnitudes. However, obtaining full astrometric parameters for each radio star through VLBI measurements demands a significant amount of observation time. This study proposes an efficient observing strategy that acquires double-epoch VLBI positions to measure the positions and proper motions of radio stars at a reduced cost. The solution for CRF link compatible with individual VLBI position measurements is introduced, and the optimized observing epoch scheduling is discussed. Applying this solution to observational data yields results sensitive to sample increase or decrease, yet they remain consistently in line with the literature at the 1-sigma level. This suggests the potential for improvement with a larger sample size. Simulations for adding observations demonstrate the double-epoch strategy reduces CRF link parameter uncertainties by over 30% compared to the five-parameter strategy.
△ Less
Submitted 26 March, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Optimizing Mobile-Friendly Viewport Prediction for Live 360-Degree Video Streaming
Authors:
Lei Zhang,
Tao Long,
Weizhen Xu,
Laizhong Cui,
Jiangchuan Liu
Abstract:
Viewport prediction is the crucial task for adaptive 360-degree video streaming, as the bitrate control algorithms usually require the knowledge of the user's viewing portions of the frames. Various methods are studied and adopted for viewport prediction from less accurate statistic tools to highly calibrated deep neural networks. Conventionally, it is difficult to implement sophisticated deep lea…
▽ More
Viewport prediction is the crucial task for adaptive 360-degree video streaming, as the bitrate control algorithms usually require the knowledge of the user's viewing portions of the frames. Various methods are studied and adopted for viewport prediction from less accurate statistic tools to highly calibrated deep neural networks. Conventionally, it is difficult to implement sophisticated deep learning methods on mobile devices, which have limited computation capability. In this work, we propose an advanced learning-based viewport prediction approach and carefully design it to introduce minimal transmission and computation overhead for mobile terminals. We also propose a model-agnostic meta-learning (MAML) based saliency prediction network trainer, which provides a few-sample fast training solution to obtain the prediction model by utilizing the information from the past models. We further discuss how to integrate this mobile-friendly viewport prediction (MFVP) approach into a typical 360-degree video live streaming system by formulating and solving the bitrate adaptation problem. Extensive experiment results show that our prediction approach can work in real-time for live video streaming and can achieve higher accuracies compared to other existing prediction methods on mobile end, which, together with our bitrate adaptation algorithm, significantly improves the streaming QoE from various aspects. We observe the accuracy of MFVP is 8.1$\%$ to 28.7$\%$ higher than other algorithms and achieves 3.73$\%$ to 14.96$\%$ higher average quality level and 49.6$\%$ to 74.97$\%$ less quality level change than other algorithms.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal
Authors:
Jianheng Huang,
Leyang Cui,
Ante Wang,
Chengyi Yang,
Xinting Liao,
Linfeng Song,
Junfeng Yao,
Jinsong Su
Abstract:
Large language models (LLMs) suffer from catastrophic forgetting during continual learning. Conventional rehearsal-based methods rely on previous training data to retain the model's ability, which may not be feasible in real-world applications. When conducting continual learning based on a publicly-released LLM checkpoint, the availability of the original training data may be non-existent. To addr…
▽ More
Large language models (LLMs) suffer from catastrophic forgetting during continual learning. Conventional rehearsal-based methods rely on previous training data to retain the model's ability, which may not be feasible in real-world applications. When conducting continual learning based on a publicly-released LLM checkpoint, the availability of the original training data may be non-existent. To address this challenge, we propose a framework called Self-Synthesized Rehearsal (SSR) that uses the LLM to generate synthetic instances for rehearsal. Concretely, we first employ the base LLM for in-context learning to generate synthetic instances. Subsequently, we utilize the latest LLM to refine the instance outputs based on the synthetic inputs, preserving its acquired ability. Finally, we select diverse high-quality synthetic instances for rehearsal in future stages. Experimental results demonstrate that SSR achieves superior or comparable performance compared to conventional rehearsal-based approaches while being more data-efficient. Besides, SSR effectively preserves the generalization capabilities of LLMs in general domains.
△ Less
Submitted 25 May, 2024; v1 submitted 2 March, 2024;
originally announced March 2024.
-
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers
Authors:
Qintong Li,
Leyang Cui,
Xueliang Zhao,
Lingpeng Kong,
Wei Bi
Abstract:
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks. However, there are increasing debates regarding whether these models truly understand and apply mathematical knowledge or merely rely on shortcuts for mathematical reasoning. One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs ca…
▽ More
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks. However, there are increasing debates regarding whether these models truly understand and apply mathematical knowledge or merely rely on shortcuts for mathematical reasoning. One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly. This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations. We introduce the adversarial grade school math (GSM-Plus) dataset, an extension of GSM8K augmented with various mathematical perturbations. Our experiments on 25 LLMs and 4 prompting techniques show that while LLMs exhibit different levels of math reasoning abilities, their performances are far from robust. In particular, even for problems that have been solved in GSM8K, LLMs can make mistakes when new statements are added or the question targets are altered. We also explore whether more robust performance can be achieved by composing existing prompting methods, in which we try an iterative method that generates and verifies each intermediate thought based on its reasoning goal and calculation result.
△ Less
Submitted 1 July, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Retrieval is Accurate Generation
Authors:
Bowen Cao,
Deng Cai,
Leyang Cui,
Xuxin Cheng,
Wei Bi,
Yuexian Zou,
Shuming Shi
Abstract:
Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most significant challenges for this paradigm shift is determining the training oracles, because a string of text can be segmented in various ways and each segment can be retr…
▽ More
Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most significant challenges for this paradigm shift is determining the training oracles, because a string of text can be segmented in various ways and each segment can be retrieved from numerous possible documents. To address this, we propose to initialize the training oracles using linguistic heuristics and, more importantly, bootstrap the oracles through iterative self-reinforcement. Extensive experiments show that our model not only outperforms standard language models on a variety of knowledge-intensive tasks but also demonstrates improved generation quality in open-ended text generation. For instance, compared to the standard language model counterpart, our model raises the accuracy from 23.47% to 36.27% on OpenbookQA, and improves the MAUVE score from 42.61% to 81.58% in open-ended text generation. Remarkably, our model also achieves the best performance and the lowest latency among several retrieval-augmented baselines. In conclusion, we assert that retrieval is more accurate generation and hope that our work will encourage further research on this new paradigm shift.
△ Less
Submitted 16 March, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
An inexact Bregman proximal point method and its acceleration version for unbalanced optimal transport
Authors:
Xiang Chen,
Faqiang Wang,
Jun Liu,
Li Cui
Abstract:
The Unbalanced Optimal Transport (UOT) problem plays increasingly important roles in computational biology, computational imaging and deep learning. Scaling algorithm is widely used to solve UOT due to its convenience and good convergence properties. However, this algorithm has lower accuracy for large regularization parameters, and due to stability issues, small regularization parameters can easi…
▽ More
The Unbalanced Optimal Transport (UOT) problem plays increasingly important roles in computational biology, computational imaging and deep learning. Scaling algorithm is widely used to solve UOT due to its convenience and good convergence properties. However, this algorithm has lower accuracy for large regularization parameters, and due to stability issues, small regularization parameters can easily lead to numerical overflow. We address this challenge by developing an inexact Bregman proximal point method for solving UOT. This algorithm approximates the proximal operator using the Scaling algorithm at each iteration. The algorithm (1) converges to the true solution of UOT, (2) has theoretical guarantees and robust regularization parameter selection, (3) mitigates numerical stability issues, and (4) can achieve comparable computational complexity to the Scaling algorithm in specific practice. Building upon this, we develop an accelerated version of inexact Bregman proximal point method for solving UOT by using acceleration techniques of Bregman proximal point method and provide theoretical guarantees and experimental validation of convergence and acceleration.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models
Authors:
Li Pang,
Xiangyu Rui,
Long Cui,
Hongzhong Wang,
Deyu Meng,
Xiangyong Cao
Abstract:
Hyperspectral image (HSI) restoration aims at recovering clean images from degraded observations and plays a vital role in downstream tasks. Existing model-based methods have limitations in accurately modeling the complex image characteristics with handcraft priors, and deep learning-based methods suffer from poor generalization ability. To alleviate these issues, this paper proposes an unsupervis…
▽ More
Hyperspectral image (HSI) restoration aims at recovering clean images from degraded observations and plays a vital role in downstream tasks. Existing model-based methods have limitations in accurately modeling the complex image characteristics with handcraft priors, and deep learning-based methods suffer from poor generalization ability. To alleviate these issues, this paper proposes an unsupervised HSI restoration framework with pre-trained diffusion model (HIR-Diff), which restores the clean HSIs from the product of two low-rank components, i.e., the reduced image and the coefficient matrix. Specifically, the reduced image, which has a low spectral dimension, lies in the image field and can be inferred from our improved diffusion model where a new guidance function with total variation (TV) prior is designed to ensure that the reduced image can be well sampled. The coefficient matrix can be effectively pre-estimated based on singular value decomposition (SVD) and rank-revealing QR (RRQR) factorization. Furthermore, a novel exponential noise schedule is proposed to accelerate the restoration process (about 5$\times$ acceleration for denoising) with little performance decrease. Extensive experimental results validate the superiority of our method in both performance and speed on a variety of HSI restoration tasks, including HSI denoising, noisy HSI super-resolution, and noisy HSI inpainting. The code is available at https://github.com/LiPang/HIRDiff.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Expediting In-Network Federated Learning by Voting-Based Consensus Model Compression
Authors:
Xiaoxin Su,
Yipeng Zhou,
Laizhong Cui,
Song Guo
Abstract:
Recently, federated learning (FL) has gained momentum because of its capability in preserving data privacy. To conduct model training by FL, multiple clients exchange model updates with a parameter server via Internet. To accelerate the communication speed, it has been explored to deploy a programmable switch (PS) in lieu of the parameter server to coordinate clients. The challenge to deploy the P…
▽ More
Recently, federated learning (FL) has gained momentum because of its capability in preserving data privacy. To conduct model training by FL, multiple clients exchange model updates with a parameter server via Internet. To accelerate the communication speed, it has been explored to deploy a programmable switch (PS) in lieu of the parameter server to coordinate clients. The challenge to deploy the PS in FL lies in its scarce memory space, prohibiting running memory consuming aggregation algorithms on the PS. To overcome this challenge, we propose Federated Learning in-network Aggregation with Compression (FediAC) algorithm, consisting of two phases: client voting and model aggregating. In the former phase, clients report their significant model update indices to the PS to estimate global significant model updates. In the latter phase, clients upload global significant model updates to the PS for aggregation. FediAC consumes much less memory space and communication traffic than existing works because the first phase can guarantee consensus compression across clients. The PS easily aligns model update indices to swiftly complete aggregation in the second phase. Finally, we conduct extensive experiments by using public datasets to demonstrate that FediAC remarkably surpasses the state-of-the-art baselines in terms of model accuracy and communication traffic.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Fed-CVLC: Compressing Federated Learning Communications with Variable-Length Codes
Authors:
Xiaoxin Su,
Yipeng Zhou,
Laizhong Cui,
John C. S. Lui,
Jiangchuan Liu
Abstract:
In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds, without touching private data owned by individual clients. FL is appealing in preserving data privacy; yet the communication between the PS and scattered clients can be a severe bottlenec…
▽ More
In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds, without touching private data owned by individual clients. FL is appealing in preserving data privacy; yet the communication between the PS and scattered clients can be a severe bottleneck. Model compression algorithms, such as quantization and sparsification, have been suggested but they generally assume a fixed code length, which does not reflect the heterogeneity and variability of model updates. In this paper, through both analysis and experiments, we show strong evidences that variable-length is beneficial for compression in FL. We accordingly present Fed-CVLC (Federated Learning Compression with Variable-Length Codes), which fine-tunes the code length in response of the dynamics of model updates. We develop optimal tuning strategy that minimizes the loss function (equivalent to maximizing the model utility) subject to the budget for communication. We further demonstrate that Fed-CVLC is indeed a general compression design that bridges quantization and sparsification, with greater flexibility. Extensive experiments have been conducted with public datasets to demonstrate that Fed-CVLC remarkably outperforms state-of-the-art baselines, improving model utility by 1.50%-5.44%, or shrinking communication traffic by 16.67%-41.61%.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
On the Broadening of the Pulse Width of FRB 20121102A due to Propagation and Instrumental Effects
Authors:
Jia-Peng Wei,
Yong-Feng Huang,
Lang Cui,
Xiang Liu,
Jin-Jun Geng,
Xue-Feng Wu
Abstract:
The pulse widths of fast radio bursts are always broadened due to the scattering of the plasma medium through which the electromagnetic wave passes. The recorded pulse width will be further affected by the radio telescopes since the sampling time and the bandwidth cannot be infinitely small. In this study, we focus on the pulse widths of the 3287 bursts detected from FRB 20121102A as of October 20…
▽ More
The pulse widths of fast radio bursts are always broadened due to the scattering of the plasma medium through which the electromagnetic wave passes. The recorded pulse width will be further affected by the radio telescopes since the sampling time and the bandwidth cannot be infinitely small. In this study, we focus on the pulse widths of the 3287 bursts detected from FRB 20121102A as of October 2023. Various effects such as the scattering broadening, the redshift broadening and the instrumental broadening are examined. It is found that the instrumental broadening only contributes a fraction of $10^{-3}$--$10^{-1}$ to the observed pulse width. The scattering broadening is even smaller, which constitutes a tiny fraction of $10^{-5}$--$10^{-2}$ in the observed pulse width. After correcting for these broadenings, the intrinsic pulse width is derived for each burst. The maximum and minimum pulse widths at different frequencies are highlighted. Interestingly, both the mean value and the dispersion range of intrinsic pulse width are found to be inversely proportional to the square of the central frequency. The intrinsic widths of most bursts are in a narrow range of 1--10 ms, which leads to a quasi-linear correlation between the fluence and the peak flux.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Gravitational Wave Emission from Close-in Strange Quark Planets Around Strange Stars with Magnetic Interactions
Authors:
Xiao-Li Zhang,
Ze-Cheng Zou,
Yong-Feng Huang,
Hao-Xuan Gao,
Pei Wang,
Lang Cui,
Xiang Liu
Abstract:
According to the strange quark matter hypothesis, strange planets may exist, which are planetary mass objects composed of almost equal numbers of up, down and strange quarks. A strange planet can revolve around its host strange star in a very close-in orbit. When it finally merges with the host, strong gravitational wave emissions will be generated. Here the gravitational waveforms are derived for…
▽ More
According to the strange quark matter hypothesis, strange planets may exist, which are planetary mass objects composed of almost equal numbers of up, down and strange quarks. A strange planet can revolve around its host strange star in a very close-in orbit. When it finally merges with the host, strong gravitational wave emissions will be generated. Here the gravitational waveforms are derived for the merging process, taking into account the effects of the strange star's magnetic field on the dynamics. Effects of the inclination angle are also considered. Templates of the gravitational waveforms are derived. It is found that the magnetic interactions significantly speed up the merging process. Coalescence events of such strange planetary systems occurring in our Galaxy as well as in local galaxies can be effectively detected by current and future gravitational experiments, which may hopefully provide a new method to test the strange quark matter hypothesis and probe the magnetic field of compact stars.
△ Less
Submitted 7 June, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Towards Personalized Privacy: User-Governed Data Contribution for Federated Recommendation
Authors:
Liang Qu,
Wei Yuan,
Ruiqi Zheng,
Lizhen Cui,
Yuhui Shi,
Hongzhi Yin
Abstract:
Federated recommender systems (FedRecs) have gained significant attention for their potential to protect user's privacy by keeping user privacy data locally and only communicating model parameters/gradients to the server. Nevertheless, the currently existing architecture of FedRecs assumes that all users have the same 0-privacy budget, i.e., they do not upload any data to the server, thus overlook…
▽ More
Federated recommender systems (FedRecs) have gained significant attention for their potential to protect user's privacy by keeping user privacy data locally and only communicating model parameters/gradients to the server. Nevertheless, the currently existing architecture of FedRecs assumes that all users have the same 0-privacy budget, i.e., they do not upload any data to the server, thus overlooking those users who are less concerned about privacy and are willing to upload data to get a better recommendation service. To bridge this gap, this paper explores a user-governed data contribution federated recommendation architecture where users are free to take control of whether they share data and the proportion of data they share to the server. To this end, this paper presents a cloud-device collaborative graph neural network federated recommendation model, named CDCGNNFed. It trains user-centric ego graphs locally, and high-order graphs based on user-shared data in the server in a collaborative manner via contrastive learning. Furthermore, a graph mending strategy is utilized to predict missing links in the graph on the server, thus leveraging the capabilities of graph neural networks over high-order graphs. Extensive experiments were conducted on two public datasets, and the results demonstrate the effectiveness of the proposed method.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Decentralized Collaborative Learning with Adaptive Reference Data for On-Device POI Recommendation
Authors:
Ruiqi Zheng,
Liang Qu,
Tong Chen,
Lizhen Cui,
Yuhui Shi,
Hongzhi Yin
Abstract:
In Location-based Social Networks, Point-of-Interest (POI) recommendation helps users discover interesting places. There is a trend to move from the cloud-based model to on-device recommendations for privacy protection and reduced server reliance. Due to the scarcity of local user-item interactions on individual devices, solely relying on local instances is not adequate. Collaborative Learning (CL…
▽ More
In Location-based Social Networks, Point-of-Interest (POI) recommendation helps users discover interesting places. There is a trend to move from the cloud-based model to on-device recommendations for privacy protection and reduced server reliance. Due to the scarcity of local user-item interactions on individual devices, solely relying on local instances is not adequate. Collaborative Learning (CL) emerges to promote model sharing among users, where reference data is an intermediary that allows users to exchange their soft decisions without directly sharing their private data or parameters, ensuring privacy and benefiting from collaboration. However, existing CL-based recommendations typically use a single reference for all users. Reference data valuable for one user might be harmful to another, given diverse user preferences. Users may not offer meaningful soft decisions on items outside their interest scope. Consequently, using the same reference data for all collaborations can impede knowledge exchange and lead to sub-optimal performance. To address this gap, we introduce the Decentralized Collaborative Learning with Adaptive Reference Data (DARD) framework, which crafts adaptive reference data for effective user collaboration. It first generates a desensitized public reference data pool with transformation and probability data generation methods. For each user, the selection of adaptive reference data is executed in parallel by training loss tracking and influence function. Local models are trained with individual private data and collaboratively with the geographical and semantic neighbors. During the collaboration between two users, they exchange soft decisions based on a combined set of their adaptive reference data. Our evaluations across two real-world datasets highlight DARD's superiority in recommendation performance and addressing the scarcity of available reference data.
△ Less
Submitted 24 January, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
Large receptive field strategy and important feature extraction strategy in 3D object detection
Authors:
Leichao Cui,
Xiuxian Li,
Min Meng,
Guangyu Jia
Abstract:
The enhancement of 3D object detection is pivotal for precise environmental perception and improved task execution capabilities in autonomous driving. LiDAR point clouds, offering accurate depth information, serve as a crucial information for this purpose. Our study focuses on key challenges in 3D target detection. To tackle the challenge of expanding the receptive field of a 3D convolutional kern…
▽ More
The enhancement of 3D object detection is pivotal for precise environmental perception and improved task execution capabilities in autonomous driving. LiDAR point clouds, offering accurate depth information, serve as a crucial information for this purpose. Our study focuses on key challenges in 3D target detection. To tackle the challenge of expanding the receptive field of a 3D convolutional kernel, we introduce the Dynamic Feature Fusion Module (DFFM). This module achieves adaptive expansion of the 3D convolutional kernel's receptive field, balancing the expansion with acceptable computational loads. This innovation reduces operations, expands the receptive field, and allows the model to dynamically adjust to different object requirements. Simultaneously, we identify redundant information in 3D features. Employing the Feature Selection Module (FSM) quantitatively evaluates and eliminates non-important features, achieving the separation of output box fitting and feature extraction. This innovation enables the detector to focus on critical features, resulting in model compression, reduced computational burden, and minimized candidate frame interference. Extensive experiments confirm that both DFFM and FSM not only enhance current benchmarks, particularly in small target detection, but also accelerate network performance. Importantly, these modules exhibit effective complementarity.
△ Less
Submitted 10 March, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Constraining annihilating dark matter using the multi-frequency radio flux profiles of the M33 galaxy
Authors:
Man Ho Chan,
Chak Man Lee,
Lang Cui,
Ning Chang,
Chun Sing Leung
Abstract:
Radio data can give stringent constraints for annihilating dark matter. In general, radio observations can detect very accurate radio flux density with high resolution and different frequencies for nearby galaxies. We are able to obtain the radio flux density as a function of distance from the galactic center and frequencies $S(r,ν)$. In this article, we demonstrate a comprehensive radio analysis…
▽ More
Radio data can give stringent constraints for annihilating dark matter. In general, radio observations can detect very accurate radio flux density with high resolution and different frequencies for nearby galaxies. We are able to obtain the radio flux density as a function of distance from the galactic center and frequencies $S(r,ν)$. In this article, we demonstrate a comprehensive radio analysis of the M33 galaxy, combining the radio flux density profile $S(r)$ and the frequency spectrum $S(ν)$ to get the constraints of dark matter annihilation parameters. By analyzing the archival radio data obtained from the Effelsberg telescope, we show that the dark matter annihilation contributing to the radio flux density might be insignificant in the disk region of the M33 galaxy. Moreover, by including the baryonic radio contribution, we constrain the $2σ$ conservative upper limits of the annihilation cross section, which can be complementary to the existing constraints based on neutrino, cosmic-ray, and gamma-ray observations. Our results indicate that analyzing the galactic multi-frequency radio flux profiles can give useful and authentic constraints on dark matter for the leptophilic annihilation channels.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Knowledge Verification to Nip Hallucination in the Bud
Authors:
Fanqi Wan,
Xinting Huang,
Leyang Cui,
Xiaojun Quan,
Wei Bi,
Shuming Shi
Abstract:
While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as \emph{hallucination}. In this paper, we demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external know…
▽ More
While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as \emph{hallucination}. In this paper, we demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs. Specifically, we propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge to evaluate the knowledge boundaries of foundation LLMs. To address knowledge inconsistencies in the alignment data, KCA implements several specific strategies to deal with these data instances. We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales. This confirms the effectiveness of mitigating hallucinations by reducing knowledge inconsistency. Our code, model weights, and data are openly accessible at \url{https://github.com/fanqiwan/KCA}.
△ Less
Submitted 16 April, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
Event-Based Visual Odometry on Non-Holonomic Ground Vehicles
Authors:
Wanting Xu,
Si'ao Zhang,
Li Cui,
Xin Peng,
Laurent Kneip
Abstract:
Despite the promise of superior performance under challenging conditions, event-based motion estimation remains a hard problem owing to the difficulty of extracting and tracking stable features from event streams. In order to robustify the estimation, it is generally believed that fusion with other sensors is a requirement. In this work, we demonstrate reliable, purely event-based visual odometry…
▽ More
Despite the promise of superior performance under challenging conditions, event-based motion estimation remains a hard problem owing to the difficulty of extracting and tracking stable features from event streams. In order to robustify the estimation, it is generally believed that fusion with other sensors is a requirement. In this work, we demonstrate reliable, purely event-based visual odometry on planar ground vehicles by employing the constrained non-holonomic motion model of Ackermann steering platforms. We extend single feature n-linearities for regular frame-based cameras to the case of quasi time-continuous event-tracks, and achieve a polynomial form via variable degree Taylor expansions. Robust averaging over multiple event tracks is simply achieved via histogram voting. As demonstrated on both simulated and real data, our algorithm achieves accurate and robust estimates of the vehicle's instantaneous rotational velocity, and thus results that are comparable to the delta rotations obtained by frame-based sensors under normal conditions. We furthermore significantly outperform the more traditional alternatives in challenging illumination scenarios. The code is available at \url{https://github.com/gowanting/NHEVO}.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models
Authors:
Shuming Shi,
Enbo Zhao,
Deng Cai,
Leyang Cui,
Xinting Huang,
Huayang Li
Abstract:
We present Inferflow, an efficient and highly configurable inference engine for large language models (LLMs). With Inferflow, users can serve most of the common transformer models by simply modifying some lines in corresponding configuration files, without writing a single line of source code. Compared with most existing inference engines, Inferflow has some key features. First, by implementing a…
▽ More
We present Inferflow, an efficient and highly configurable inference engine for large language models (LLMs). With Inferflow, users can serve most of the common transformer models by simply modifying some lines in corresponding configuration files, without writing a single line of source code. Compared with most existing inference engines, Inferflow has some key features. First, by implementing a modular framework of atomic build-blocks and technologies, Inferflow is compositionally generalizable to new models. Second, 3.5-bit quantization is introduced in Inferflow as a tradeoff between 3-bit and 4-bit quantization. Third, hybrid model partitioning for multi-GPU inference is introduced in Inferflow to better balance inference speed and throughput than the existing partition-by-layer and partition-by-tensor strategies.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics
Authors:
Beiwen Tian,
Huan-ang Gao,
Leiyao Cui,
Yupeng Zheng,
Lan Luo,
Baofeng Wang,
Rong Zhi,
Guyue Zhou,
Hao Zhao
Abstract:
In the past several years, road anomaly segmentation is actively explored in the academia and drawing growing attention in the industry. The rationale behind is straightforward: if the autonomous car can brake before hitting an anomalous object, safety is promoted. However, this rationale naturally calls for a temporally informed setting while existing methods and benchmarks are designed in an unr…
▽ More
In the past several years, road anomaly segmentation is actively explored in the academia and drawing growing attention in the industry. The rationale behind is straightforward: if the autonomous car can brake before hitting an anomalous object, safety is promoted. However, this rationale naturally calls for a temporally informed setting while existing methods and benchmarks are designed in an unrealistic frame-wise manner. To bridge this gap, we contribute the first video anomaly segmentation dataset for autonomous driving. Since placing various anomalous objects on busy roads and annotating them in every frame are dangerous and expensive, we resort to synthetic data. To improve the relevance of this synthetic dataset to real-world applications, we train a generative adversarial network conditioned on rendering G-buffers for photorealism enhancement. Our dataset consists of 120,000 high-resolution frames at a 60 FPS framerate, as recorded in 7 different towns. As an initial benchmarking, we provide baselines using latest supervised and unsupervised road anomaly segmentation methods. Apart from conventional ones, we focus on two new metrics: temporal consistency and latencyaware streaming accuracy. We believe the latter is valuable as it measures whether an anomaly segmentation algorithm can truly prevent a car from crashing in a temporally informed setting.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
RHOBIN Challenge: Reconstruction of Human Object Interaction
Authors:
Xianghui Xie,
Xi Wang,
Nikos Athanasiou,
Bharat Lal Bhatnagar,
Chun-Hao P. Huang,
Kaichun Mo,
Hao Chen,
Xia Jia,
Zerui Zhang,
Liangxian Cui,
Xiao Lin,
Bingqiao Qian,
Jie Xiao,
Wenfei Yang,
Hyeongjin Nam,
Daniel Sungho Jung,
Kihoon Kim,
Kyoung Mu Lee,
Otmar Hilliges,
Gerard Pons-Moll
Abstract:
Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate resear…
▽ More
Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate research fields in computer vision for a long time. We hence proposed the first RHOBIN challenge: reconstruction of human-object interactions in conjunction with the RHOBIN workshop. It was aimed at bringing the research communities of human and object reconstruction as well as interaction modeling together to discuss techniques and exchange ideas. Our challenge consists of three tracks of 3D reconstruction from monocular RGB images with a focus on dealing with challenging interaction scenarios. Our challenge attracted more than 100 participants with more than 300 submissions, indicating the broad interest in the research communities. This paper describes the settings of our challenge and discusses the winning methods of each track in more detail. We observe that the human reconstruction task is becoming mature even under heavy occlusion settings while object pose estimation and joint reconstruction remain challenging tasks. With the growing interest in interaction modeling, we hope this report can provide useful insights and foster future research in this direction. Our workshop website can be found at \href{https://rhobin-challenge.github.io/}{https://rhobin-challenge.github.io/}.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Alleviating Hallucinations of Large Language Models through Induced Hallucinations
Authors:
Yue Zhang,
Leyang Cui,
Wei Bi,
Shuming Shi
Abstract:
Despite their impressive capabilities, large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information, a phenomenon commonly known as ``hallucination''. In this work, we propose a simple \textit{Induce-then-Contrast} Decoding (ICD) strategy to alleviate hallucinations. We first construct a factually weak LLM by inducing hallucinations from t…
▽ More
Despite their impressive capabilities, large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information, a phenomenon commonly known as ``hallucination''. In this work, we propose a simple \textit{Induce-then-Contrast} Decoding (ICD) strategy to alleviate hallucinations. We first construct a factually weak LLM by inducing hallucinations from the original LLMs. Then, we penalize these induced hallucinations during decoding to enhance the factuality of the generated content. Concretely, we determine the final next-token predictions by amplifying the predictions from the original model and downplaying the induced untruthful predictions via contrastive decoding. Experimental results on both discrimination-based and generation-based hallucination evaluation benchmarks, such as TruthfulQA and \textsc{FActScore}, demonstrate that our proposed ICD methods can effectively enhance the factuality of LLMs across various model sizes and families. For example, when equipped with ICD, Llama2-7B-Chat and Mistral-7B-Instruct achieve performance comparable to ChatGPT and GPT4 on TruthfulQA, respectively.
△ Less
Submitted 11 March, 2024; v1 submitted 25 December, 2023;
originally announced December 2023.