Search | arXiv e-print repository

doi 10.1093/mnrasl/slad178

VLBI detection of the AE Aqr twin, LAMOST J024048.51+195226.9

Authors: Pengfei Jiang, Lang Cui, Xiang Liu, Bo Zhang, Yongfeng Huang, Hongmin Cao, Tao An, Jun Yang, Fengchun Shu, Guiping Tan, Jianping Yuan

Abstract: LAMOST J024048.51+195226.9 (J0240+1952) was recently identified as the second AE Aquarii (AE Aqr)-type cataclysmic variable, possessing the fastest known rotating white dwarf. We performed a Very Long Baseline Interferometry (VLBI) observation of J0240+1952 utilizing the European VLBI Network at 1.7\,GHz, to obtain the first view of the radio morphology on mas scale. Our high-resolution VLBI image… ▽ More LAMOST J024048.51+195226.9 (J0240+1952) was recently identified as the second AE Aquarii (AE Aqr)-type cataclysmic variable, possessing the fastest known rotating white dwarf. We performed a Very Long Baseline Interferometry (VLBI) observation of J0240+1952 utilizing the European VLBI Network at 1.7\,GHz, to obtain the first view of the radio morphology on mas scale. Our high-resolution VLBI image clearly shows that the radio emission is compact on mas scale ($\lesssim2$\,AU), with no evidence for a radio jet or extended emission. The compact radio source has an average flux density of $\sim0.37$\,mJy, and its brightness temperature is given at $\gtrsim2.3\times10^{7}$\,K, confirming a non-thermal origin. The emission exhibits irregular variations on a time-scale of tens of minutes, similar to the radio flares seen in AE Aqr. The measured VLBI position of J0240+1952 is consistent with that derived from \textit{Gaia}. Our results favour the model in which the radio emission is attributed to a superposition of synchrotron radiation from expanding magnetized blobs of this system. △ Less

Submitted 30 November, 2023; originally announced December 2023.

arXiv:2311.16465 [pdf, other]

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

Authors: Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

Abstract: The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text. Several methods alleviated this issue by incorporating explicit text position and content as guidance on where and what text to render. However, these methods still suffer from several drawbacks, such as limited flexibility and automation, constrained capability of la… ▽ More The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text. Several methods alleviated this issue by incorporating explicit text position and content as guidance on where and what text to render. However, these methods still suffer from several drawbacks, such as limited flexibility and automation, constrained capability of layout prediction, and restricted style diversity. In this paper, we present TextDiffuser-2, aiming to unleash the power of language models for text rendering. Firstly, we fine-tune a large language model for layout planning. The large language model is capable of automatically generating keywords for text rendering and also supports layout modification through chatting. Secondly, we utilize the language model within the diffusion model to encode the position and texts at the line level. Unlike previous methods that employed tight character-level guidance, this approach generates more diverse text images. We conduct extensive experiments and incorporate user studies involving human participants as well as GPT-4V, validating TextDiffuser-2's capacity to achieve a more rational text layout and generation with enhanced diversity. The code and model will be available at \url{https://aka.ms/textdiffuser-2}. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.09802 [pdf, other]

Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs

Authors: Sen Yang, Xin Li, Leyang Cui, Lidong Bing, Wai Lam

Abstract: Though prompting LLMs with various reasoning structures produces reasoning proofs along with answers, these proofs are not ensured to be causal and reliable due to the inherent defects of LLMs. Tracking such deficiencies, we present a neuro-symbolic integration method, in which a neural LLM is used to represent the knowledge of the problem while an LLM-free symbolic solver is adopted to do deliber… ▽ More Though prompting LLMs with various reasoning structures produces reasoning proofs along with answers, these proofs are not ensured to be causal and reliable due to the inherent defects of LLMs. Tracking such deficiencies, we present a neuro-symbolic integration method, in which a neural LLM is used to represent the knowledge of the problem while an LLM-free symbolic solver is adopted to do deliberative reasoning using the knowledge. Specifically, our customized meta-interpreters allow the production of reasoning proofs and support flexible search strategies. These reasoning proofs are ensured to be causal and reliable because of the deterministic executing nature of the symbolic solvers. Empirically, on ProofWriter, our method surpasses the CoT baseline by nearly double in accuracy and more than triple in proof similarity. On GSM8K, our method also shows accuracy improvements and nearly doubled proof similarity. Our code is released at https://github.com/DAMO-NLP-SG/CaRing △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.07624 [pdf]

Disordered hyperuniformity signals functioning and resilience of self-organized vegetation patterns

Authors: Wensi Hu, Quan-Xing Liu, Bo Wang, Nuo Xu, Lijuan Cui, Chi Xu

Abstract: In harsh environments, organisms may self-organize into spatially patterned systems in various ways. So far, studies of ecosystem spatial self-organization have primarily focused on apparent orders reflected by regular patterns. However, self-organized ecosystems may also have cryptic orders that can be unveiled only through certain quantitative analyses. Here we show that disordered hyperuniformi… ▽ More In harsh environments, organisms may self-organize into spatially patterned systems in various ways. So far, studies of ecosystem spatial self-organization have primarily focused on apparent orders reflected by regular patterns. However, self-organized ecosystems may also have cryptic orders that can be unveiled only through certain quantitative analyses. Here we show that disordered hyperuniformity as a striking class of hidden orders can exist in spatially self-organized vegetation landscapes. By analyzing the high-resolution remotely sensed images across the American drylands, we demonstrate that it is not uncommon to find disordered hyperuniform vegetation states characterized by suppressed density fluctuations at long range. Such long-range hyperuniformity has been documented in a wide range of microscopic systems. Our finding contributes to expanding this domain to accommodate natural landscape ecological systems. We use theoretical modeling to propose that disordered hyperuniform vegetation patterning can arise from three generalized mechanisms prevalent in dryland ecosystems, including (1) critical absorbing states driven by an ecological legacy effect, (2) scale-dependent feedbacks driven by plant-plant facilitation and competition, and (3) density-dependent aggregation driven by plant-sediment feedbacks. Our modeling results also show that disordered hyperuniform patterns can help ecosystems cope with arid conditions with enhanced functioning of soil moisture acquisition. However, this advantage may come at the cost of slower recovery of ecosystem structure upon perturbations. Our work highlights that disordered hyperuniformity as a distinguishable but underexplored ecosystem self-organization state merits systematic studies to better understand its underlying mechanisms, functioning, and resilience. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 34 pages, 6 figures; Supplementary Materials, 19 pages, 10 figures, 2 tables

arXiv:2311.07324 [pdf, other]

DAGC: Data-Volume-Aware Adaptive Sparsification Gradient Compression for Distributed Machine Learning in Mobile Computing

Authors: Rongwei Lu, Yutong Jiang, Yinan Mao, Chen Tang, Bin Chen, Laizhong Cui, Zhi Wang

Abstract: Distributed machine learning (DML) in mobile environments faces significant communication bottlenecks. Gradient compression has emerged as an effective solution to this issue, offering substantial benefits in environments with limited bandwidth and metered data. Yet, they encounter severe performance drop in non-IID environments due to a one-size-fits-all compression approach, which does not accou… ▽ More Distributed machine learning (DML) in mobile environments faces significant communication bottlenecks. Gradient compression has emerged as an effective solution to this issue, offering substantial benefits in environments with limited bandwidth and metered data. Yet, they encounter severe performance drop in non-IID environments due to a one-size-fits-all compression approach, which does not account for the varying data volumes across workers. Assigning varying compression ratios to workers with distinct data distributions and volumes is thus a promising solution. This study introduces an analysis of distributed SGD with non-uniform compression, which reveals that the convergence rate (indicative of the iterations needed to achieve a certain accuracy) is influenced by compression ratios applied to workers with differing volumes. Accordingly, we frame relative compression ratio assignment as an $n$-variables chi-square nonlinear optimization problem, constrained by a fixed and limited communication budget. We propose DAGC-R, which assigns the worker handling larger data volumes the conservative compression. Recognizing the computational limitations of mobile devices, we DAGC-A, which are computationally less demanding and enhances the robustness of the absolute gradient compressor in non-IID scenarios. Our experiments confirm that both the DAGC-A and DAGC-R can achieve better performance when dealing with highly imbalanced data volume distribution and restricted communication. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.04025 [pdf, other]

General relativistic stochastic thermodynamics

Authors: Tao Wang, Yifan Cai, Long Cui, Liu Zhao

Abstract: Based on the recent work [1,2], we formulate the first law and the second law of stochastic thermodynamics in the framework of general relativity. These laws are established for a charged Brownian particle moving in a heat reservoir and subjecting to an external electromagnetic field in generic stationary spacetime background, and in order to maintain general covariance, they are presented respect… ▽ More Based on the recent work [1,2], we formulate the first law and the second law of stochastic thermodynamics in the framework of general relativity. These laws are established for a charged Brownian particle moving in a heat reservoir and subjecting to an external electromagnetic field in generic stationary spacetime background, and in order to maintain general covariance, they are presented respectively in terms of the divergences of the energy current and the entropy density current. The stability of the equilibrium state is also analyzed. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 15 pages, 1 figure

arXiv:2311.03924 [pdf, ps, other]

Follow-up on the Supermassive Black Hole Binary Candidate J1048+7143: Successful Prediction of the Next Gamma-ray Flare and Refined Binary Parameters in the Framework of Jet Precession Model

Authors: Emma Kun, Ilja Jaroschewski, Julia Becker Tjus, Silke Britzen, Sándor Frey, Krisztina Éva Gabányi, Lang Cui, Xin Wang, Yuling Shen

Abstract: Analyzing single-dish and VLBI radio, as well as \textit{Fermi}-LAT $γ$-ray observations, we explained the three major flares in the $γ$-ray light curve of FSRQ J1048+7143 with the spin--orbit precession of the dominant mass black hole in a supermassive black hole binary system. Here, we report on the detection of a fourth $γ$-ray flare from J1048+7143, appearing in the time interval which was pre… ▽ More Analyzing single-dish and VLBI radio, as well as \textit{Fermi}-LAT $γ$-ray observations, we explained the three major flares in the $γ$-ray light curve of FSRQ J1048+7143 with the spin--orbit precession of the dominant mass black hole in a supermassive black hole binary system. Here, we report on the detection of a fourth $γ$-ray flare from J1048+7143, appearing in the time interval which was predicted in our previous work. Including this new flare, we constrained the mass ratio into a narrow range of $0.062<q<0.088$, and consequently we were able to further constrain the parameters of the hypothetical supermassive binary black hole at the heart of J1048+7143. We predict the occurrence of the fifth major $γ$-ray flare that would appear only if the jet will still lay close to our line sight. The fourth major $γ$-ray flare also shows the two-subflare structure, further strengthening our scenario in which the occurrence of the subflares is the signature of the precession of a spine--sheath jet structure that quasi-periodically interacts with a proton target, e.g. clouds in the broad-line region. △ Less

Submitted 8 February, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: 9 pages, 4 figures, 3 tables. Accepted to ApJL

arXiv:2311.02395 [pdf, ps, other]

doi 10.1093/mnras/stad3583

Multi-band Cross-correlated Radio Variability of the Blazar 3C 279

Authors: Krishna Mohana A, Alok C. Gupta, Alan P. Marscher, Yulia V. Sotnikova, S. G. Jorstad, Paul J. Wiita, Lang Cui, Margo F. Aller, Hugh D. Aller, Yu. A. Kovalev, Y. Y. Kovalev, Xiang Liu, T. V. Mufakharov, A. V. Popkov, M. G. Mingaliev, A. K. Erkenov, N. A. Nizhelsky, P. G. Tsybulev, Wei Zhao, Z. R. Weaver, D. A. Morozova

Abstract: We present the results of our study of cross-correlations between long-term multi-band observations of the radio variability of the blazar 3C 279. More than a decade (2008-2022) of radio data were collected at seven different frequencies ranging from 2 GHz to 230 GHz. The multi-band radio light curves show variations in flux, with the prominent flare features appearing first at higher-frequency an… ▽ More We present the results of our study of cross-correlations between long-term multi-band observations of the radio variability of the blazar 3C 279. More than a decade (2008-2022) of radio data were collected at seven different frequencies ranging from 2 GHz to 230 GHz. The multi-band radio light curves show variations in flux, with the prominent flare features appearing first at higher-frequency and later in lower-frequency bands. This behavior is quantified by cross-correlation analysis, which finds that the emission at lower-frequency bands lags that at higher-frequency bands. Lag versus frequency plots are well fit by straight lines with negative slope, typically ~-30 day/GHz. We discuss these flux variations in conjunction with the evolution of bright moving knots seen in multi-epoch VLBA maps to suggest possible physical changes in the jet that can explain the observational results. Some of the variations are consistent with the predictions of shock models, while others are better explained by a changing Doppler beaming factor as the knot trajectory bends slightly, given a small viewing angle to the jet. △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: Submitted revised version to MNRAS journal, 11 pages, 6 figures, 4 tables

Journal ref: MNRAS 527 (2024) 6970

arXiv:2310.20381 [pdf, other]

A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis

Authors: Yingshu Li, Yunyi Liu, Zhanyu Wang, Xinyu Liang, Lei Wang, Lingqiao Liu, Leyang Cui, Zhaopeng Tu, Longyue Wang, Luping Zhou

Abstract: This work conducts an evaluation of GPT-4V's multimodal capability for medical image analysis, with a focus on three representative tasks of radiology report generation, medical visual question answering, and medical visual grounding. For the evaluation, a set of prompts is designed for each task to induce the corresponding capability of GPT-4V to produce sufficiently good outputs. Three evaluatio… ▽ More This work conducts an evaluation of GPT-4V's multimodal capability for medical image analysis, with a focus on three representative tasks of radiology report generation, medical visual question answering, and medical visual grounding. For the evaluation, a set of prompts is designed for each task to induce the corresponding capability of GPT-4V to produce sufficiently good outputs. Three evaluation ways including quantitative analysis, human evaluation, and case study are employed to achieve an in-depth and extensive evaluation. Our evaluation shows that GPT-4V excels in understanding medical images and is able to generate high-quality radiology reports and effectively answer questions about medical images. Meanwhile, it is found that its performance for medical visual grounding needs to be substantially improved. In addition, we observe the discrepancy between the evaluation outcome from quantitative analysis and that from human evaluation. This discrepancy suggests the limitations of conventional metrics in assessing the performance of large language models like GPT-4V and the necessity of developing new metrics for automatic quantitative analysis. △ Less

Submitted 30 January, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.19740 [pdf, other]

Collaborative Evaluation: Exploring the Synergy of Large Language Models and Humans for Open-ended Generation Evaluation

Authors: Qintong Li, Leyang Cui, Lingpeng Kong, Wei Bi

Abstract: Humans are widely involved in the evaluation of open-ended natural language generation tasks (NLG) that demand creativity, as automatic metrics often exhibit weak correlations with human judgments. Large language models (LLMs) recently have emerged as a scalable and cost-effective alternative to human evaluations. However, both humans and LLMs have limitations, i.e., inherent subjectivity and unre… ▽ More Humans are widely involved in the evaluation of open-ended natural language generation tasks (NLG) that demand creativity, as automatic metrics often exhibit weak correlations with human judgments. Large language models (LLMs) recently have emerged as a scalable and cost-effective alternative to human evaluations. However, both humans and LLMs have limitations, i.e., inherent subjectivity and unreliable judgments, particularly for open-ended tasks that require adaptable metrics tailored to diverse task requirements. To explore the synergy between humans and LLM-based evaluators and address the challenges of existing inconsistent evaluation criteria in open-ended NLG tasks, we propose a Collaborative Evaluation pipeline CoEval, involving the design of a checklist of task-specific criteria and the detailed evaluation of texts, in which LLM generates initial ideation, and then humans engage in scrutiny. We conducted a series of experiments to investigate the mutual effects between LLMs and humans in CoEval. Results show that, by utilizing LLMs, CoEval effectively evaluates lengthy texts, saving significant time and reducing human evaluation outliers. Human scrutiny still plays a role, revising around 20% of LLM evaluation scores for ultimate reliability. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: We release our resources at \url{https://github.com/qtli/CoEval}

arXiv:2310.14274 [pdf, other]

Robust Visual Imitation Learning with Inverse Dynamics Representations

Authors: Siyuan Li, Xun Wang, Rongchang Zuo, Kewu Sun, Lingfei Cui, Jishiyu Ding, Peng Liu, Zhe Ma

Abstract: Imitation learning (IL) has achieved considerable success in solving complex sequential decision-making problems. However, current IL methods mainly assume that the environment for learning policies is the same as the environment for collecting expert datasets. Therefore, these methods may fail to work when there are slight differences between the learning and expert environments, especially for c… ▽ More Imitation learning (IL) has achieved considerable success in solving complex sequential decision-making problems. However, current IL methods mainly assume that the environment for learning policies is the same as the environment for collecting expert datasets. Therefore, these methods may fail to work when there are slight differences between the learning and expert environments, especially for challenging problems with high-dimensional image observations. However, in real-world scenarios, it is rare to have the chance to collect expert trajectories precisely in the target learning environment. To address this challenge, we propose a novel robust imitation learning approach, where we develop an inverse dynamics state representation learning objective to align the expert environment and the learning environment. With the abstract state representation, we design an effective reward function, which thoroughly measures the similarity between behavior data and expert data not only element-wise, but also from the trajectory level. We conduct extensive experiments to evaluate the proposed approach under various visual perturbations and in diverse visual control tasks. Our approach can achieve a near-expert performance in most environments, and significantly outperforms the state-of-the-art visual IL methods and robust IL methods. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.13345 [pdf, other]

An LLM can Fool Itself: A Prompt-Based Adversarial Attack

Authors: Xilie Xu, Keyi Kong, Ning Liu, Lizhen Cui, Di Wang, Jingfeng Zhang, Mohan Kankanhalli

Abstract: The wide-ranging applications of large language models (LLMs), especially in safety-critical domains, necessitate the proper evaluation of the LLM's adversarial robustness. This paper proposes an efficient tool to audit the LLM's adversarial robustness via a prompt-based adversarial attack (PromptAttack). PromptAttack converts adversarial textual attacks into an attack prompt that can cause the vi… ▽ More The wide-ranging applications of large language models (LLMs), especially in safety-critical domains, necessitate the proper evaluation of the LLM's adversarial robustness. This paper proposes an efficient tool to audit the LLM's adversarial robustness via a prompt-based adversarial attack (PromptAttack). PromptAttack converts adversarial textual attacks into an attack prompt that can cause the victim LLM to output the adversarial sample to fool itself. The attack prompt is composed of three important components: (1) original input (OI) including the original sample and its ground-truth label, (2) attack objective (AO) illustrating a task description of generating a new sample that can fool itself without changing the semantic meaning, and (3) attack guidance (AG) containing the perturbation instructions to guide the LLM on how to complete the task by perturbing the original sample at character, word, and sentence levels, respectively. Besides, we use a fidelity filter to ensure that PromptAttack maintains the original semantic meanings of the adversarial examples. Further, we enhance the attack power of PromptAttack by ensembling adversarial examples at different perturbation levels. Comprehensive empirical results using Llama2 and GPT-3.5 validate that PromptAttack consistently yields a much higher attack success rate compared to AdvGLUE and AdvGLUE++. Interesting findings include that a simple emoji can easily mislead GPT-3.5 to make wrong predictions. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2310.09015 [pdf, other]

doi 10.1038/s41586-023-06479-6

Precessing jet nozzle connecting to a spinning black hole in M87

Authors: Yuzhu Cui, Kazuhiro Hada, Tomohisa Kawashima, Motoki Kino, Weikang Lin, Yosuke Mizuno, Hyunwook Ro, Mareki Honma, Kunwoo Yi, Jintao Yu, Jongho Park, Wu Jiang, Zhiqiang Shen, Evgeniya Kravchenko, Juan-Carlos Algaba, Xiaopeng Cheng, Ilje Cho, Gabriele Giovannini, Marcello Giroletti, Taehyun Jung, Ru-Sen Lu, Kotaro Niinuma, Junghwan Oh, Ken Ohsuga, Satoko Sawada-Satoh , et al. (54 additional authors not shown)

Abstract: The nearby radio galaxy M87 offers a unique opportunity to explore the connections between the central supermassive black hole and relativistic jets. Previous studies of the inner region of M87 revealed a wide opening angle for the jet originating near the black hole. The Event Horizon Telescope resolved the central radio source and found an asymmetric ring structure consistent with expectations f… ▽ More The nearby radio galaxy M87 offers a unique opportunity to explore the connections between the central supermassive black hole and relativistic jets. Previous studies of the inner region of M87 revealed a wide opening angle for the jet originating near the black hole. The Event Horizon Telescope resolved the central radio source and found an asymmetric ring structure consistent with expectations from General Relativity. With a baseline of 17 years of observations, there was a shift in the jet's transverse position, possibly arising from an eight to ten-year quasi-periodicity. However, the origin of this sideways shift remains unclear. Here we report an analysis of radio observations over 22 years that suggests a period of about 11 years in the position angle variation of the jet. We infer that we are seeing a spinning black hole that induces the Lense-Thirring precession of a misaligned accretion disk. Similar jet precession may commonly occur in other active galactic nuclei but has been challenging to detect owing to the small magnitude and long period of the variation. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: 41 pages, 7 figures, 7 tables

Journal ref: 2023, Nature, 621, 711-715

arXiv:2310.07988 [pdf, other]

Recovery of phase constant from two-photon interference pattern by phase retrieval algorithm

Authors: Yuhang Lei, Wen Zhao, Liang cui, Xiaoyin Li

Abstract: For a HOM interferometer with two independent incident pulses, the interference pattern can be affected by adding a dispersion medium on one of the incident directions, but there hasn't been a method to reconstruct the phase constant of the medium from the interference pattern. To solve it, we adapted two phase retrieval algorithms and used them to recover the phase difference function between the… ▽ More For a HOM interferometer with two independent incident pulses, the interference pattern can be affected by adding a dispersion medium on one of the incident directions, but there hasn't been a method to reconstruct the phase constant of the medium from the interference pattern. To solve it, we adapted two phase retrieval algorithms and used them to recover the phase difference function between the two incident fields, from which the phase constant can be derived. Through simulations, we verified the convergence, accuracy, and robustness of the algorithms, indicating that this phase recovery process can be completed well with negligible error. Our research finds a new application direction for the phase recovery algorithm, provides an algorithmic tool for high-order dispersion measurement using two-photon interference, and paves the way for a higher resolution and phase-sensitive quantum tomography. △ Less

Submitted 14 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 12 pages, 8 figures

arXiv:2310.07821 [pdf, other]

Non-autoregressive Text Editing with Copy-aware Latent Alignments

Authors: Yu Zhang, Yue Zhang, Leyang Cui, Guohong Fu

Abstract: Recent work has witnessed a paradigm shift from Seq2Seq to Seq2Edit in the field of text editing, with the aim of addressing the slow autoregressive inference problem posed by the former. Despite promising results, Seq2Edit approaches still face several challenges such as inflexibility in generation and difficulty in generalizing to other languages. In this work, we propose a novel non-autoregress… ▽ More Recent work has witnessed a paradigm shift from Seq2Seq to Seq2Edit in the field of text editing, with the aim of addressing the slow autoregressive inference problem posed by the former. Despite promising results, Seq2Edit approaches still face several challenges such as inflexibility in generation and difficulty in generalizing to other languages. In this work, we propose a novel non-autoregressive text editing method to circumvent the above issues, by modeling the edit process with latent CTC alignments. We make a crucial extension to CTC by introducing the copy operation into the edit space, thus enabling more efficient management of textual overlap in editing. We conduct extensive experiments on GEC and sentence fusion tasks, showing that our proposed method significantly outperforms existing Seq2Edit models and achieves similar or even better results than Seq2Seq with over $4\times$ speedup. Moreover, it demonstrates good generalizability on German and Russian. In-depth analyses reveal the strengths of our method in terms of the robustness under various scenarios and generating fluent and flexible outputs. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: EMNLP 2023

arXiv:2310.07481 [pdf, other]

Iterative solution of relativistic Boltzmann equation in curved spacetime with application to kinetic coefficients

Authors: Long Cui, Xin Hao, Liu Zhao

Abstract: Under relaxation time approximation, we obtain an iterative solution to the relativistic Boltzmann equation in generic stationary spacetime. This solution provides a scheme to study non-equilibrium system order by order. As a specific example, we analytically calculated the covariant expressions of the particle flow and the energy momentum tensor up to the first order in relaxation time. Finally a… ▽ More Under relaxation time approximation, we obtain an iterative solution to the relativistic Boltzmann equation in generic stationary spacetime. This solution provides a scheme to study non-equilibrium system order by order. As a specific example, we analytically calculated the covariant expressions of the particle flow and the energy momentum tensor up to the first order in relaxation time. Finally and most importantly, we present all 14 kinetic coefficients for a neutral system, which are verified to satisfy the Onsager reciprocal relation and guarantee a non-negative entropy production. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 15 pages, 1 figure

arXiv:2310.07299 [pdf, other]

RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation

Authors: Yue Zhang, Leyang Cui, Enbo Zhao, Wei Bi, Shuming Shi

Abstract: Grammatical Error Correction (GEC) systems play a vital role in assisting people with their daily writing tasks. However, users may sometimes come across a GEC system that initially performs well but fails to correct errors when the inputs are slightly modified. To ensure an ideal user experience, a reliable GEC system should have the ability to provide consistent and accurate suggestions when enc… ▽ More Grammatical Error Correction (GEC) systems play a vital role in assisting people with their daily writing tasks. However, users may sometimes come across a GEC system that initially performs well but fails to correct errors when the inputs are slightly modified. To ensure an ideal user experience, a reliable GEC system should have the ability to provide consistent and accurate suggestions when encountering irrelevant context perturbations, which we refer to as context robustness. In this paper, we introduce RobustGEC, a benchmark designed to evaluate the context robustness of GEC systems. RobustGEC comprises 5,000 GEC cases, each with one original error-correct sentence pair and five variants carefully devised by human annotators. Utilizing RobustGEC, we reveal that state-of-the-art GEC systems still lack sufficient robustness against context perturbations. In addition, we propose a simple yet effective method for remitting this issue. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023 (main conference, long paper)

arXiv:2310.07163 [pdf]

doi 10.1007/s11433-023-2131-1

The Qitai Radio Telescope

Authors: Na Wang, Qian Xu, Jun Ma, Zhiyong Liu, Qi Liu, Hailong Zhang, Xin Pei, Maozheng Chen, Richard N. Manchester, Kejia Lee, Xingwu Zheng, Hans J. Kärcher, Wulin Zhao, Hongwei Li, Dongwei Li, Martin Süss, Matthias Reichert, Zhongyi Zhu, Congsi Wang, Mingshuai Li, Rui Li, Ning Li, Guljaina Kazezkhan, Wenming Yan, Gang Wu , et al. (3 additional authors not shown)

Abstract: This study presents a general outline of the Qitai radio telescope (QTT) project. Qitai, the site of the telescope, is a county of Xinjiang Uygur Autonomous Region of China, located in the east Tianshan Mountains at an elevation of about 1800 m. The QTT is a fully steerable, Gregorian type telescope with a standard parabolic main reflector of 110 m diameter. The QTT has adopted an um-brella suppor… ▽ More This study presents a general outline of the Qitai radio telescope (QTT) project. Qitai, the site of the telescope, is a county of Xinjiang Uygur Autonomous Region of China, located in the east Tianshan Mountains at an elevation of about 1800 m. The QTT is a fully steerable, Gregorian type telescope with a standard parabolic main reflector of 110 m diameter. The QTT has adopted an um-brella support, homology-symmetric lightweight design. The main reflector is active so that the deformation caused by gravity can be corrected. The structural design aims to ultimately allow high-sensitivity observations from 150 MHz up to 115 GHz. To satisfy the requirements for early scientific goals, the QTT will be equipped with ultra-wideband receivers and large field-of-view mul-ti-beam receivers. A multi-function signal-processing system based on RFSoC and GPU processor chips will be developed. These will enable the QTT to operate in pulsar, spectral line, continuum and Very Long Baseline Interferometer (VLBI) observing modes. Electromagnetic compatibility (EMC) and radio frequency interference (RFI) control techniques are adopted throughout the system design. The QTT will form a world-class observational platform for the detection of low-frequency (nanoHertz) gravitational waves through pulsar timing array (PTA) techniques, pulsar surveys, the discovery of binary black-hole systems, and exploring dark matter and the origin of life in the universe. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 12 pages, 11 figures, accepted for publication in Science China Physics, Mechanics & Astronomy

Journal ref: Sci China-Phys Mech Astron, 2023, 66: 289512

arXiv:2310.05341 [pdf, other]

A Critical Look at Classic Test-Time Adaptation Methods in Semantic Segmentation

Authors: Chang'an Yi, Haotian Chen, Yifan Zhang, Yonghui Xu, Lizhen Cui

Abstract: Test-time adaptation (TTA) aims to adapt a model, initially trained on training data, to potential distribution shifts in the test data. Most existing TTA studies, however, focus on classification tasks, leaving a notable gap in the exploration of TTA for semantic segmentation. This pronounced emphasis on classification might lead numerous newcomers and engineers to mistakenly assume that classic… ▽ More Test-time adaptation (TTA) aims to adapt a model, initially trained on training data, to potential distribution shifts in the test data. Most existing TTA studies, however, focus on classification tasks, leaving a notable gap in the exploration of TTA for semantic segmentation. This pronounced emphasis on classification might lead numerous newcomers and engineers to mistakenly assume that classic TTA methods designed for classification can be directly applied to segmentation. Nonetheless, this assumption remains unverified, posing an open question. To address this, we conduct a systematic, empirical study to disclose the unique challenges of segmentation TTA, and to determine whether classic TTA strategies can effectively address this task. Our comprehensive results have led to three key observations. First, the classic batch norm updating strategy, commonly used in classification TTA, only brings slight performance improvement, and in some cases it might even adversely affect the results. Even with the application of advanced distribution estimation techniques like batch renormalization, the problem remains unresolved. Second, the teacher-student scheme does enhance training stability for segmentation TTA in the presence of noisy pseudo-labels. However, it cannot directly result in performance improvement compared to the original model without TTA. Third, segmentation TTA suffers a severe long-tailed imbalance problem, which is substantially more complex than that in TTA for classification. This long-tailed challenge significantly affects segmentation TTA performance, even when the accuracy of pseudo-labels is high. In light of these observations, we conclude that TTA for segmentation presents significant challenges, and simply using classic TTA methods cannot address this problem well. △ Less

Submitted 11 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

arXiv:2310.02930 [pdf, ps, other]

Small-Disturbance Input-to-State Stability of Perturbed Gradient Flows: Applications to LQR Problem

Authors: Leilei Cui, Zhong-Ping Jiang, Eduardo D. Sontag

Abstract: This paper studies the effect of perturbations on the gradient flow of a general nonlinear programming problem, where the perturbation may arise from inaccurate gradient estimation in the setting of data-driven optimization. Under suitable conditions on the objective function, the perturbed gradient flow is shown to be small-disturbance input-to-state stable (ISS), which implies that, in the prese… ▽ More This paper studies the effect of perturbations on the gradient flow of a general nonlinear programming problem, where the perturbation may arise from inaccurate gradient estimation in the setting of data-driven optimization. Under suitable conditions on the objective function, the perturbed gradient flow is shown to be small-disturbance input-to-state stable (ISS), which implies that, in the presence of a small-enough perturbation, the trajectories of the perturbed gradient flow must eventually enter a small neighborhood of the optimum. This work was motivated by the question of robustness of direct methods for the linear quadratic regulator problem, and specifically the analysis of the effect of perturbations caused by gradient estimation or round-off errors in policy optimization. We show small-disturbance ISS for three of the most common optimization algorithms: standard gradient flow, natural gradient flow, and Newton gradient flow. △ Less

Submitted 16 April, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: 20 pages

arXiv:2310.00919 [pdf, other]

BAAF: A Benchmark Attention Adaptive Framework for Medical Ultrasound Image Segmentation Tasks

Authors: Gongping Chen, Lei Zhao, Xiaotao Yin, Liang Cui, Jianxun Zhang, Yu Dai

Abstract: The AI-based assisted diagnosis programs have been widely investigated on medical ultrasound images. Complex scenario of ultrasound image, in which the coupled interference of internal and external factors is severe, brings a unique challenge for localize the object region automatically and precisely in ultrasound images. In this study, we seek to propose a more general and robust Benchmark Attent… ▽ More The AI-based assisted diagnosis programs have been widely investigated on medical ultrasound images. Complex scenario of ultrasound image, in which the coupled interference of internal and external factors is severe, brings a unique challenge for localize the object region automatically and precisely in ultrasound images. In this study, we seek to propose a more general and robust Benchmark Attention Adaptive Framework (BAAF) to assist doctors segment or diagnose lesions and tissues in ultrasound images more quickly and accurately. Different from existing attention schemes, the BAAF consists of a parallel hybrid attention module (PHAM) and an adaptive calibration mechanism (ACM). Specifically, BAAF first coarsely calibrates the input features from the channel and spatial dimensions, and then adaptively selects more robust lesion or tissue characterizations from the coarse-calibrated feature maps. The design of BAAF further optimizes the "what" and "where" focus and selection problems in CNNs and seeks to improve the segmentation accuracy of lesions or tissues in medical ultrasound images. The method is evaluated on four medical ultrasound segmentation tasks, and the adequate experimental results demonstrate the remarkable performance improvement over existing state-of-the-art methods. In addition, the comparison with existing attention mechanisms also demonstrates the superiority of BAAF. This work provides the possibility for automated medical ultrasound assisted diagnosis and reduces reliance on human accuracy and precision. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2309.17415 [pdf, other]

Intuitive or Dependent? Investigating LLMs' Behavior Style to Conflicting Prompts

Authors: Jiahao Ying, Yixin Cao, Kai Xiong, Yidong He, Long Cui, Yongbin Liu

Abstract: This study investigates the behaviors of Large Language Models (LLMs) when faced with conflicting prompts versus their internal memory. This will not only help to understand LLMs' decision mechanism but also benefit real-world applications, such as retrieval-augmented generation (RAG). Drawing on cognitive theory, we target the first scenario of decision-making styles where there is no superiority… ▽ More This study investigates the behaviors of Large Language Models (LLMs) when faced with conflicting prompts versus their internal memory. This will not only help to understand LLMs' decision mechanism but also benefit real-world applications, such as retrieval-augmented generation (RAG). Drawing on cognitive theory, we target the first scenario of decision-making styles where there is no superiority in the conflict and categorize LLMs' preference into dependent, intuitive, and rational/irrational styles. Another scenario of factual robustness considers the correctness of prompt and memory in knowledge-intensive tasks, which can also distinguish if LLMs behave rationally or irrationally in the first scenario. To quantify them, we establish a complete benchmarking framework including a dataset, a robustness evaluation pipeline, and corresponding metrics. Extensive experiments with seven LLMs reveal their varying behaviors. And, with role play intervention, we can change the styles, but different models present distinct adaptivity and upper-bound. One of our key takeaways is to optimize models or the prompts according to the identified style. For instance, RAG models with high role play adaptability may dynamically adjust the interventions according to the quality of retrieval results -- being dependent to better leverage informative context; and, being intuitive when external prompt is noisy. △ Less

Submitted 20 February, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.12641 [pdf, other]

Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects

Authors: Feng Yan, Xiaoheng Jiang, Yang Lu, Lisha Cui, Shupan Li, Jiale Cao, Mingliang Xu, Dacheng Tao

Abstract: Surface defect inspection is a very challenging task in which surface defects usually show weak appearances or exist under complex backgrounds. Most high-accuracy defect detection methods require expensive computation and storage overhead, making them less practical in some resource-constrained defect detection applications. Although some lightweight methods have achieved real-time inference speed… ▽ More Surface defect inspection is a very challenging task in which surface defects usually show weak appearances or exist under complex backgrounds. Most high-accuracy defect detection methods require expensive computation and storage overhead, making them less practical in some resource-constrained defect detection applications. Although some lightweight methods have achieved real-time inference speed with fewer parameters, they show poor detection accuracy in complex defect scenarios. To this end, we develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure. First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module. The proposed DSA performs element-wise similarity in channel dimension while maintaining linear complexity. In addition, we introduce a novel Channel Reference Attention (CRA) module before each decoder block to strengthen the representation of multi-level features in the bottom-up path. The proposed CRA exploits the channel correlation between features at different layers to adaptively enhance feature representation. The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods. Specifically, GCANet achieves competitive accuracy (91.79% $F_β^{w}$, 93.55% $S_α$, and 97.35% $E_φ$) on SD-saliency-900 while running 272fps on a single gpu. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.11419 [pdf, other]

Kosmos-2.5: A Multimodal Literate Model

Authors: Tengchao Lv, Yupan Huang, Jingye Chen, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei

Abstract: We present Kosmos-2.5, a multimodal literate model for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styl… ▽ More We present Kosmos-2.5, a multimodal literate model for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format. This unified multimodal literate capability is achieved through a shared Transformer architecture, task-specific prompts, and flexible text representations. We evaluate Kosmos-2.5 on end-to-end document-level text recognition and image-to-markdown text generation. Furthermore, the model can be readily adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning, making it a general-purpose tool for real-world applications involving text-rich images. This work also paves the way for the future scaling of multimodal large language models. △ Less

Submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.01219 [pdf, other]

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Authors: Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, Shuming Shi

Abstract: While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge… ▽ More While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this paper, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research. △ Less

Submitted 24 September, 2023; v1 submitted 3 September, 2023; originally announced September 2023.

Comments: work in progress; 32 pages

arXiv:2308.11459 [pdf, other]

Phase Dependent Hanbury-Brown and Twiss effect

Authors: Xuan Tang, Yunxiao Zhang, Xueshi Guo, Liang Cui, Xiaoying Li, Z. Y. Ou

Abstract: Hanbury-Brown and Twiss (HBT) effect is the foundation for stellar intensity interferometry. However, it is a phase insensitive two-photon interference effect. In this paper, we extend the HBT interferometer by mixing two phase-coherent input fields with coherent auxiliary fields before intensity correlation measurement and achieve phase sensitive two-photon interference so as to measure the compl… ▽ More Hanbury-Brown and Twiss (HBT) effect is the foundation for stellar intensity interferometry. However, it is a phase insensitive two-photon interference effect. In this paper, we extend the HBT interferometer by mixing two phase-coherent input fields with coherent auxiliary fields before intensity correlation measurement and achieve phase sensitive two-photon interference so as to measure the complete complex second-order coherence function of the input fields. This practical scheme paves the way for synthetic aperture imaging for astronomical applications in optical regime. Pulsed input fields is also tested for potential remote sensing and ranging applications. We discuss the condition to implement recently proposed entanglement-based telescopy scheme with the more realistic cw broadband anti-bunched light fields. △ Less

Submitted 30 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: 5 pages, 6 figures

arXiv:2308.01578 [pdf, other]

Unsupervised Representation Learning for Time Series: A Review

Authors: Qianwen Meng, Hangwei Qian, Yong Liu, Yonghui Xu, Zhiqi Shen, Lizhen Cui

Abstract: Unsupervised representation learning approaches aim to learn discriminative feature representations from unlabeled data, without the requirement of annotating every sample. Enabling unsupervised representation learning is extremely crucial for time series data, due to its unique annotation bottleneck caused by its complex characteristics and lack of visual cues compared with other data modalities.… ▽ More Unsupervised representation learning approaches aim to learn discriminative feature representations from unlabeled data, without the requirement of annotating every sample. Enabling unsupervised representation learning is extremely crucial for time series data, due to its unique annotation bottleneck caused by its complex characteristics and lack of visual cues compared with other data modalities. In recent years, unsupervised representation learning techniques have advanced rapidly in various domains. However, there is a lack of systematic analysis of unsupervised representation learning approaches for time series. To fill the gap, we conduct a comprehensive literature review of existing rapidly evolving unsupervised representation learning approaches for time series. Moreover, we also develop a unified and standardized library, named ULTS (i.e., Unsupervised Learning for Time Series), to facilitate fast implementations and unified evaluations on various models. With ULTS, we empirically evaluate state-of-the-art approaches, especially the rapidly evolving contrastive learning methods, on 9 diverse real-world datasets. We further discuss practical considerations as well as open research challenges on unsupervised representation learning for time series to facilitate future research in this field. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: In submission to IEEE

arXiv:2307.12810 [pdf, other]

HeteFedRec: Federated Recommender Systems with Model Heterogeneity

Authors: Wei Yuan, Liang Qu, Lizhen Cui, Yongxin Tong, Xiaofang Zhou, Hongzhi Yin

Abstract: Owing to the nature of privacy protection, federated recommender systems (FedRecs) have garnered increasing interest in the realm of on-device recommender systems. However, most existing FedRecs only allow participating clients to collaboratively train a recommendation model of the same public parameter size. Training a model of the same size for all clients can lead to suboptimal performance sinc… ▽ More Owing to the nature of privacy protection, federated recommender systems (FedRecs) have garnered increasing interest in the realm of on-device recommender systems. However, most existing FedRecs only allow participating clients to collaboratively train a recommendation model of the same public parameter size. Training a model of the same size for all clients can lead to suboptimal performance since clients possess varying resources. For example, clients with limited training data may prefer to train a smaller recommendation model to avoid excessive data consumption, while clients with sufficient data would benefit from a larger model to achieve higher recommendation accuracy. To address the above challenge, this paper introduces HeteFedRec, a novel FedRec framework that enables the assignment of personalized model sizes to participants. In HeteFedRec, we present a heterogeneous recommendation model aggregation strategy, including a unified dual-task learning mechanism and a dimensional decorrelation regularization, to allow knowledge aggregation among recommender models of different sizes. Additionally, a relation-based ensemble knowledge distillation method is proposed to effectively distil knowledge from heterogeneous item embeddings. Extensive experiments conducted on three real-world recommendation datasets demonstrate the effectiveness and efficiency of HeteFedRec in training federated recommender systems under heterogeneous settings. △ Less

Submitted 5 December, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.10247 [pdf, other]

Automated Action Model Acquisition from Narrative Texts

Authors: Ruiqi Li, Leyang Cui, Songtuan Lin, Patrik Haslum

Abstract: Action models, which take the form of precondition/effect axioms, facilitate causal and motivational connections between actions for AI agents. Action model acquisition has been identified as a bottleneck in the application of planning technology, especially within narrative planning. Acquiring action models from narrative texts in an automated way is essential, but challenging because of the inhe… ▽ More Action models, which take the form of precondition/effect axioms, facilitate causal and motivational connections between actions for AI agents. Action model acquisition has been identified as a bottleneck in the application of planning technology, especially within narrative planning. Acquiring action models from narrative texts in an automated way is essential, but challenging because of the inherent complexities of such texts. We present NaRuto, a system that extracts structured events from narrative text and subsequently generates planning-language-style action models based on predictions of commonsense event relations, as well as textual contradictions and similarities, in an unsupervised manner. Experimental results in classical narrative planning domains show that NaRuto can generate action models of significantly better quality than existing fully automated methods, and even on par with those of semi-automated methods. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: 10 pages, 3 figures

arXiv:2307.08074 [pdf, other]

Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling

Authors: Longyue Wang, Zefeng Du, Donghuai Liu, Deng Cai, Dian Yu, Haiyun Jiang, Yan Wang, Leyang Cui, Shuming Shi, Zhaopeng Tu

Abstract: Modeling discourse -- the linguistic phenomena that go beyond individual sentences, is a fundamental yet challenging aspect of natural language processing (NLP). However, existing evaluation benchmarks primarily focus on the evaluation of inter-sentence properties and overlook critical discourse phenomena that cross sentences. To bridge the gap, we propose Disco-Bench, a benchmark that can evaluat… ▽ More Modeling discourse -- the linguistic phenomena that go beyond individual sentences, is a fundamental yet challenging aspect of natural language processing (NLP). However, existing evaluation benchmarks primarily focus on the evaluation of inter-sentence properties and overlook critical discourse phenomena that cross sentences. To bridge the gap, we propose Disco-Bench, a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks, covering understanding, translation, and generation. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena (e.g. cohesion and coherence) in Chinese and/or English. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge. We totally evaluate 20 general-, in-domain and commercial models based on Transformer, advanced pretraining architectures and large language models (LLMs). Our results show (1) the challenge and necessity of our evaluation benchmark; (2) fine-grained pretraining based on literary document-level training data consistently improves the modeling of discourse information. We will release the datasets, pretrained models, and leaderboard, which we hope can significantly facilitate research in this field: https://github.com/longyuewangdcu/Disco-Bench. △ Less

Submitted 21 July, 2023; v1 submitted 16 July, 2023; originally announced July 2023.

Comments: Zhaopeng Tu is the corresponding author

arXiv:2307.03021 [pdf]

Shadow operator: Effective dynamic load change operation training in air separation processes based on industrial nonlinear MPC and Bloom's taxonomy

Authors: Guanghui Yang, Zhijiang Shao, Rui Wang, Zuhua Xu, Lidan Cui

Abstract: A novel human-machine interactive training method for dynamic load change operation in air separation processes (ASPs) is proposed. A shadow operator (SO) is developed in this method to train ASP operators through industrial model predictive control (IMPC) and Bloom's taxonomy. First, a nonlinear two-layer IMPC machine algorithm is developed for dynamic load change operation. The IMPC uses a linea… ▽ More A novel human-machine interactive training method for dynamic load change operation in air separation processes (ASPs) is proposed. A shadow operator (SO) is developed in this method to train ASP operators through industrial model predictive control (IMPC) and Bloom's taxonomy. First, a nonlinear two-layer IMPC machine algorithm is developed for dynamic load change operation. The IMPC uses a linear parameter varying prediction model and an iterative multi-step linearization algorithm to compute accurate control decisions. Second, a hierarchical human-machine cooperation model is established to improve the effectiveness of operation training. The model is inspired by an educational psychology framework (Bloom's taxonomy) and assists ASP operators in enhancing their dynamic operational skills. Finally, five dynamic training modes of the SO are designed based on the IMPC algorithm and the human-machine cooperation model. The practical application results demonstrate that the SO improves the effectiveness of skill acquisition for novice operators and the safety of dynamic operations. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Comments: 16 pages, 18 figures

arXiv:2306.11485 [pdf, other]

Explicit Syntactic Guidance for Neural Text Generation

Authors: Yafu Li, Leyang Cui, Jianhao Yan, Yongjing Yin, Wei Bi, Shuming Shi, Yue Zhang

Abstract: Most existing text generation models follow the sequence-to-sequence paradigm. Generative Grammar suggests that humans generate natural language texts by learning language grammar. We propose a syntax-guided generation schema, which generates the sequence guided by a constituency parse tree in a top-down direction. The decoding process can be decomposed into two parts: (1) predicting the infilling… ▽ More Most existing text generation models follow the sequence-to-sequence paradigm. Generative Grammar suggests that humans generate natural language texts by learning language grammar. We propose a syntax-guided generation schema, which generates the sequence guided by a constituency parse tree in a top-down direction. The decoding process can be decomposed into two parts: (1) predicting the infilling texts for each constituent in the lexicalized syntax context given the source sentence; (2) mapping and expanding each constituent to construct the next-level syntax context. Accordingly, we propose a structural beam search method to find possible syntax structures hierarchically. Experiments on paraphrase generation and machine translation show that the proposed method outperforms autoregressive baselines, while also demonstrating effectiveness in terms of interpretability, controllability, and diversity. △ Less

Submitted 25 June, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: ACL 2023

arXiv:2306.10248 [pdf, other]

doi 10.3847/1538-4357/acdd74

Multi-wavelength temporal variability of the blazar PKS 1510-089

Authors: Q. Yuan, Pankaj Kushwaha, Alok C. Gupta, Ashutosh Tripathi, Paul J. Wiita, M. Zhang, X. Liu, Anne Lahteenmaki, Merja Tornikoski, Joni Tammi, Venkatessh Ramakrishnan, L. Cui, X. Wang, M. F. Gu, Cosimo Bambi, A. E. Volvach

Abstract: We perform correlation and periodicity search analyses on long-term multi-band light curves of the FSRQ 1510-089 observed by the space-based Fermi--Large Area Telescope in gamma-rays, the SMARTS and Steward Observatory telescopes in optical and near-infrared (NIR) and the 13.7 m radio telescope in Metsahovi Radio Observatory between 2008 and 2018. The z-transform discrete correlation function meth… ▽ More We perform correlation and periodicity search analyses on long-term multi-band light curves of the FSRQ 1510-089 observed by the space-based Fermi--Large Area Telescope in gamma-rays, the SMARTS and Steward Observatory telescopes in optical and near-infrared (NIR) and the 13.7 m radio telescope in Metsahovi Radio Observatory between 2008 and 2018. The z-transform discrete correlation function method is applied to study the correlation and possible time lags among these multi band light curves. Among all pairs of wavelengths, the gamma-ray vs. optical/NIR and optical vs. NIR correlations show zero time lags; however, both the gamma-ray and optical/NIR emissions precede the radio radiation. The Generalized Lomb-Scargle periodogram, Weighted Wavelet Z-transform, and REDFIT techniques are employed to investigate the unresolved-core-emission dominated 37 GHz light curve and yield evidence for a quasi-period around 1540 days, although given the length of the whole data set it cannot be claimed to be significant. We also investigate the optical/NIR color variability and find that this source shows a simple redder-when-brighter behavior over time, even in the low flux state. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: Accepted for publication in ApJ; 20 pages, 9 figures, 4 tables

arXiv:2306.08871 [pdf, other]

Med-MMHL: A Multi-Modal Dataset for Detecting Human- and LLM-Generated Misinformation in the Medical Domain

Authors: Yanshen Sun, Jianfeng He, Shuo Lei, Limeng Cui, Chang-Tien Lu

Abstract: The pervasive influence of misinformation has far-reaching and detrimental effects on both individuals and society. The COVID-19 pandemic has witnessed an alarming surge in the dissemination of medical misinformation. However, existing datasets pertaining to misinformation predominantly focus on textual information, neglecting the inclusion of visual elements, and tend to center solely on COVID-19… ▽ More The pervasive influence of misinformation has far-reaching and detrimental effects on both individuals and society. The COVID-19 pandemic has witnessed an alarming surge in the dissemination of medical misinformation. However, existing datasets pertaining to misinformation predominantly focus on textual information, neglecting the inclusion of visual elements, and tend to center solely on COVID-19-related misinformation, overlooking misinformation surrounding other diseases. Furthermore, the potential of Large Language Models (LLMs), such as the ChatGPT developed in late 2022, in generating misinformation has been overlooked in previous works. To overcome these limitations, we present Med-MMHL, a novel multi-modal misinformation detection dataset in a general medical domain encompassing multiple diseases. Med-MMHL not only incorporates human-generated misinformation but also includes misinformation generated by LLMs like ChatGPT. Our dataset aims to facilitate comprehensive research and development of methodologies for detecting misinformation across diverse diseases and various scenarios, including human and LLM-generated misinformation detection at the sentence, document, and multi-modal levels. To access our dataset and code, visit our GitHub repository: \url{https://github.com/styxsys0927/Med-MMHL}. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.08604 [pdf, other]

A Unified Framework of Graph Information Bottleneck for Robustness and Membership Privacy

Authors: Enyan Dai, Limeng Cui, Zhengyang Wang, Xianfeng Tang, Yinghan Wang, Monica Cheng, Bing Yin, Suhang Wang

Abstract: Graph Neural Networks (GNNs) have achieved great success in modeling graph-structured data. However, recent works show that GNNs are vulnerable to adversarial attacks which can fool the GNN model to make desired predictions of the attacker. In addition, training data of GNNs can be leaked under membership inference attacks. This largely hinders the adoption of GNNs in high-stake domains such as e-… ▽ More Graph Neural Networks (GNNs) have achieved great success in modeling graph-structured data. However, recent works show that GNNs are vulnerable to adversarial attacks which can fool the GNN model to make desired predictions of the attacker. In addition, training data of GNNs can be leaked under membership inference attacks. This largely hinders the adoption of GNNs in high-stake domains such as e-commerce, finance and bioinformatics. Though investigations have been made in conducting robust predictions and protecting membership privacy, they generally fail to simultaneously consider the robustness and membership privacy. Therefore, in this work, we study a novel problem of developing robust and membership privacy-preserving GNNs. Our analysis shows that Information Bottleneck (IB) can help filter out noisy information and regularize the predictions on labeled samples, which can benefit robustness and membership privacy. However, structural noises and lack of labels in node classification challenge the deployment of IB on graph-structured data. To mitigate these issues, we propose a novel graph information bottleneck framework that can alleviate structural noises with neighbor bottleneck. Pseudo labels are also incorporated in the optimization to minimize the gap between the predictions on the labeled set and unlabeled set for membership privacy. Extensive experiments on real-world datasets demonstrate that our method can give robust predictions and simultaneously preserve membership privacy. △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2305.15676 [pdf, other]

Enhancing Grammatical Error Correction Systems with Explanations

Authors: Yuejiao Fei, Leyang Cui, Sen Yang, Wai Lam, Zhenzhong Lan, Shuming Shi

Abstract: Grammatical error correction systems improve written communication by detecting and correcting language mistakes. To help language learners better understand why the GEC system makes a certain correction, the causes of errors (evidence words) and the corresponding error types are two key factors. To enhance GEC systems with explanations, we introduce EXPECT, a large dataset annotated with evidence… ▽ More Grammatical error correction systems improve written communication by detecting and correcting language mistakes. To help language learners better understand why the GEC system makes a certain correction, the causes of errors (evidence words) and the corresponding error types are two key factors. To enhance GEC systems with explanations, we introduce EXPECT, a large dataset annotated with evidence words and grammatical error types. We propose several baselines and analysis to understand this task. Furthermore, human evaluation verifies our explainable GEC system's explanations can assist second-language learners in determining whether to accept a correction suggestion and in understanding the associated grammar rule. △ Less

Submitted 10 June, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: 9 pages, 7 figures, accepted to the main conference of ACL 2023

arXiv:2305.13614 [pdf, other]

LLM-empowered Chatbots for Psychiatrist and Patient Simulation: Application and Evaluation

Authors: Siyuan Chen, Mengyue Wu, Kenny Q. Zhu, Kunyao Lan, Zhiling Zhang, Lyuchun Cui

Abstract: Empowering chatbots in the field of mental health is receiving increasing amount of attention, while there still lacks exploration in developing and evaluating chatbots in psychiatric outpatient scenarios. In this work, we focus on exploring the potential of ChatGPT in powering chatbots for psychiatrist and patient simulation. We collaborate with psychiatrists to identify objectives and iterativel… ▽ More Empowering chatbots in the field of mental health is receiving increasing amount of attention, while there still lacks exploration in developing and evaluating chatbots in psychiatric outpatient scenarios. In this work, we focus on exploring the potential of ChatGPT in powering chatbots for psychiatrist and patient simulation. We collaborate with psychiatrists to identify objectives and iteratively develop the dialogue system to closely align with real-world scenarios. In the evaluation experiments, we recruit real psychiatrists and patients to engage in diagnostic conversations with the chatbots, collecting their ratings for assessment. Our findings demonstrate the feasibility of using ChatGPT-powered chatbots in psychiatric scenarios and explore the impact of prompt designs on chatbot behavior and user experience. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.13242 [pdf, other]

MAGE: Machine-generated Text Detection in the Wild

Authors: Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Zhilin Wang, Longyue Wang, Linyi Yang, Shuming Shi, Yue Zhang

Abstract: Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection to mitigate risks like the spread of fake news and plagiarism. Existing research has been constrained by evaluating detection methods on specific domains or particular language models. In practical scenarios, however, the detector faces texts from various domains o… ▽ More Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection to mitigate risks like the spread of fake news and plagiarism. Existing research has been constrained by evaluating detection methods on specific domains or particular language models. In practical scenarios, however, the detector faces texts from various domains or LLMs without knowing their sources. To this end, we build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs. Empirical results show challenges in distinguishing machine-generated texts from human-authored ones across various scenarios, especially out-of-distribution. These challenges are due to the decreasing linguistic distinctions between the two sources. Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios. We release our resources at https://github.com/yafuly/MAGE. △ Less

Submitted 21 May, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: ACL 2024

arXiv:2305.13225 [pdf, other]

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

Authors: Yue Zhang, Leyang Cui, Deng Cai, Xinting Huang, Tao Fang, Wei Bi

Abstract: Proprietary Large Language Models (LLMs), such as ChatGPT, have garnered significant attention due to their exceptional capabilities in handling a diverse range of tasks. Recent studies demonstrate that open-sourced smaller foundational models, such as 7B-size LLaMA, can also display remarkable proficiency in tackling diverse tasks when fine-tuned using instruction-driven data. In this work, we in… ▽ More Proprietary Large Language Models (LLMs), such as ChatGPT, have garnered significant attention due to their exceptional capabilities in handling a diverse range of tasks. Recent studies demonstrate that open-sourced smaller foundational models, such as 7B-size LLaMA, can also display remarkable proficiency in tackling diverse tasks when fine-tuned using instruction-driven data. In this work, we investigate a practical problem setting where the primary focus is on one or a few particular tasks rather than general-purpose instruction following, and explore whether LLMs can be beneficial and further improved for such targeted scenarios. We choose the writing-assistant scenario as the testbed, which includes seven writing tasks. We collect training data for these tasks, reframe them in an instruction-following format, and subsequently refine the LLM, specifically LLaMA, via instruction tuning. Experimental results show that fine-tuning LLaMA on writing instruction data significantly improves its ability on writing tasks. We also conduct more experiments and analyses to offer insights for future work on effectively fine-tuning LLaMA for specific scenarios. Finally, we initiate a discussion regarding the necessity of employing LLMs for only one targeted task, taking into account the efforts required for tuning and the resources consumed during deployment. △ Less

Submitted 9 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.12147 [pdf, other]

LogiCoT: Logical Chain-of-Thought Instruction-Tuning

Authors: Hanmeng Liu, Zhiyang Teng, Leyang Cui, Chaoli Zhang, Qiji Zhou, Yue Zhang

Abstract: Generative Pre-trained Transformer 4 (GPT-4) demonstrates impressive chain-of-thought reasoning ability. Recent work on self-instruction tuning, such as Alpaca, has focused on enhancing the general proficiency of models. These instructions enable the model to achieve performance comparable to GPT-3.5 on general tasks like open-domain text generation and paraphrasing. However, they fall short of he… ▽ More Generative Pre-trained Transformer 4 (GPT-4) demonstrates impressive chain-of-thought reasoning ability. Recent work on self-instruction tuning, such as Alpaca, has focused on enhancing the general proficiency of models. These instructions enable the model to achieve performance comparable to GPT-3.5 on general tasks like open-domain text generation and paraphrasing. However, they fall short of helping the model handle complex reasoning tasks. To bridge the gap, this paper presents LogiCoT, a new instruction-tuning dataset for Logical Chain-of-Thought reasoning with GPT-4. We elaborate on the process of harvesting instructions for prompting GPT-4 to generate chain-of-thought rationales. LogiCoT serves as an instruction set for teaching models of logical reasoning and elicits general reasoning skills. △ Less

Submitted 28 October, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

arXiv:2305.10855 [pdf, other]

TextDiffuser: Diffusion Models as Text Painters

Authors: Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

Abstract: Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text. To address this issue, we introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds. TextDiffuser consists of two stages: first, a Transformer model generates the layout of keywords e… ▽ More Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text. To address this issue, we introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds. TextDiffuser consists of two stages: first, a Transformer model generates the layout of keywords extracted from text prompts, and then diffusion models generate images conditioned on the text prompt and the generated layout. Additionally, we contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs with text recognition, detection, and character-level segmentation annotations. We further collect the MARIO-Eval benchmark to serve as a comprehensive tool for evaluating text rendering quality. Through experiments and user studies, we show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text. The code, model, and dataset will be available at \url{https://aka.ms/textdiffuser}. △ Less

Submitted 30 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023

arXiv:2305.10013 [pdf, other]

When Gradient Descent Meets Derivative-Free Optimization: A Match Made in Black-Box Scenario

Authors: Chengcheng Han, Liqing Cui, Renyu Zhu, Jianing Wang, Nuo Chen, Qiushi Sun, Xiang Li, Ming Gao

Abstract: Large pre-trained language models (PLMs) have garnered significant attention for their versatility and potential for solving a wide spectrum of natural language processing (NLP) tasks. However, the cost of running these PLMs may be prohibitive. Furthermore, PLMs may not be open-sourced due to commercial considerations and potential risks of misuse, such as GPT-3. The parameters and gradients of PL… ▽ More Large pre-trained language models (PLMs) have garnered significant attention for their versatility and potential for solving a wide spectrum of natural language processing (NLP) tasks. However, the cost of running these PLMs may be prohibitive. Furthermore, PLMs may not be open-sourced due to commercial considerations and potential risks of misuse, such as GPT-3. The parameters and gradients of PLMs are unavailable in this scenario. To solve the issue, black-box tuning has been proposed, which utilizes derivative-free optimization (DFO), instead of gradient descent, for training task-specific continuous prompts. However, these gradient-free methods still exhibit a significant gap compared to gradient-based methods. In this paper, we introduce gradient descent into black-box tuning scenario through knowledge distillation. Furthermore, we propose a novel method GDFO, which integrates gradient descent and derivative-free optimization to optimize task-specific continuous prompts in a harmonized manner. Experimental results show that GDFO can achieve significant performance gains over previous state-of-the-art methods. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2305.08461 [pdf, other]

doi 10.1103/PhysRevLett.131.160203

Quantum Reliability

Authors: L. X. Cui, Y-M. Du, C. P. Sun

Abstract: Quantum technology has led to increasingly sophisticated and complex quantum devices. Assessing their reliability (quantum reliability) is an important issue. Although reliability theory for classical devices has been well developed in industry and technology, a suitable metric on quantum reliability and its loss has not been systematically investigated. Since reliability-loss depends on the proce… ▽ More Quantum technology has led to increasingly sophisticated and complex quantum devices. Assessing their reliability (quantum reliability) is an important issue. Although reliability theory for classical devices has been well developed in industry and technology, a suitable metric on quantum reliability and its loss has not been systematically investigated. Since reliability-loss depends on the process, quantum fidelity does not always fully depict it. This study provides a metric of quantum reliability by shifting the focus from state-distinguishing to trajectory-distinguishing. In contrast to the conventional notion of classical reliability, which is evaluated using probabilistic measurements of binary logical variables, quantum reliability is grounded in the quantum probability amplitude or wave function. This research provides a universal framework for reliability theory encompassing both classical and quantum devices. It offers a new perspective on quantum engineering by elucidating how intensely the real quantum process a device undergoes influences its performance. △ Less

Submitted 22 October, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: 5 pages, 3 figures. Comments welcome!

Journal ref: Phys.Rev.Lett.131,160203 (2023)

arXiv:2305.01221 [pdf, ps, other]

Affine Toda system of $\mathbf{A}$ and $\mathbf{C}^t$ type: compactness and affine Weyl group

Authors: Leilei Cui, Zhaohu Nie, Wen Yang

Abstract: The local mass is a fundamental quantized information that characterizes the blow-up solution to the Toda system and has a profound relationship with its underlying algebraic structure. In \cite{Lin-Yang-Zhong-2020}, it was observed that the associated Weyl group can be employed to represent this information for the $\mathbf{A}_n$, $\mathbf{B}_n$, $\mathbf{C}_n$ and $\mathbf{G}_2$ type Toda system… ▽ More The local mass is a fundamental quantized information that characterizes the blow-up solution to the Toda system and has a profound relationship with its underlying algebraic structure. In \cite{Lin-Yang-Zhong-2020}, it was observed that the associated Weyl group can be employed to represent this information for the $\mathbf{A}_n$, $\mathbf{B}_n$, $\mathbf{C}_n$ and $\mathbf{G}_2$ type Toda systems. The relationship between the local mass of blow-up solution and the corresponding affine Weyl group is further explored for some affine $\mathbf{B}$ type Toda systems in \cite{Cui-Wei-Yang-Zhang-2022}, where the possible local masses are explicitly expressed in terms of $8$ types. The current work presents a comprehensive study of the general affine $\mathbf{A}$ and $\mathbf{C}^t$ type Toda systems with arbitrary rank. At each stage of the blow-up process (via scaling), we can employ certain elements (known as "set chains") in the corresponding affine Weyl group to measure the variation of local mass. Consequently, we obtain the a priori estimate of the affine $\mathbf{A}$ and $\mathbf{C}^t$ type Toda systems with arbitrary number of singularities. △ Less

Submitted 2 May, 2023; originally announced May 2023.

Comments: 40 pages

arXiv:2305.00783 [pdf, other]

Explicit Knowledge Graph Reasoning for Conversational Recommendation

Authors: Xuhui Ren, Tong Chen, Quoc Viet Hung Nguyen, Lizhen Cui, Zi Huang, Hongzhi Yin

Abstract: Traditional recommender systems estimate user preference on items purely based on historical interaction records, thus failing to capture fine-grained yet dynamic user interests and letting users receive recommendation only passively. Recent conversational recommender systems (CRSs) tackle those limitations by enabling recommender systems to interact with the user to obtain her/his current prefere… ▽ More Traditional recommender systems estimate user preference on items purely based on historical interaction records, thus failing to capture fine-grained yet dynamic user interests and letting users receive recommendation only passively. Recent conversational recommender systems (CRSs) tackle those limitations by enabling recommender systems to interact with the user to obtain her/his current preference through a sequence of clarifying questions. Despite the progress achieved in CRSs, existing solutions are far from satisfaction in the following two aspects: 1) current CRSs usually require each user to answer a quantity of clarifying questions before reaching the final recommendation, which harms the user experience; 2) there is a semantic gap between the learned representations of explicitly mentioned attributes and items. To address these drawbacks, we introduce the knowledge graph (KG) as the auxiliary information for comprehending and reasoning a user's preference, and propose a new CRS framework, namely Knowledge Enhanced Conversational Reasoning (KECR) system. As a user can reflect her/his preference via both attribute- and item-level expressions, KECR closes the semantic gap between two levels by embedding the structured knowledge in the KG. Meanwhile, KECR utilizes the connectivity within the KG to conduct explicit reasoning of the user demand, making the model less dependent on the user's feedback to clarifying questions. KECR can find a prominent reasoning chain to make the recommendation explainable and more rationale, as well as smoothen the conversation process, leading to better user experience and conversational recommendation accuracy. Extensive experiments on two real-world datasets demonstrate our approach's superiority over state-of-the-art baselines in both automatic evaluations and human judgments. △ Less

Submitted 1 May, 2023; originally announced May 2023.

arXiv:2304.08492 [pdf, other]

STRAP: Structured Object Affordance Segmentation with Point Supervision

Authors: Leiyao Cui, Xiaoxue Chen, Hao Zhao, Guyue Zhou, Yixin Zhu

Abstract: With significant annotation savings, point supervision has been proven effective for numerous 2D and 3D scene understanding problems. This success is primarily attributed to the structured output space; i.e., samples with high spatial affinity tend to share the same labels. Sharing this spirit, we study affordance segmentation with point supervision, wherein the setting inherits an unexplored dual… ▽ More With significant annotation savings, point supervision has been proven effective for numerous 2D and 3D scene understanding problems. This success is primarily attributed to the structured output space; i.e., samples with high spatial affinity tend to share the same labels. Sharing this spirit, we study affordance segmentation with point supervision, wherein the setting inherits an unexplored dual affinity-spatial affinity and label affinity. By label affinity, we refer to affordance segmentation as a multi-label prediction problem: A plate can be both holdable and containable. By spatial affinity, we refer to a universal prior that nearby pixels with similar visual features should share the same point annotation. To tackle label affinity, we devise a dense prediction network that enhances label relations by effectively densifying labels in a new domain (i.e., label co-occurrence). To address spatial affinity, we exploit a Transformer backbone for global patch interaction and a regularization loss. In experiments, we benchmark our method on the challenging CAD120 dataset, showing significant performance gains over prior methods. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: Code: https://github.com/LeiyaoCui/STRAP

arXiv:2304.07194 [pdf, ps, other]

Normalized solutions for a Kirchhoff type equations with potential in $\mathbb{R}^3$

Authors: Leilei Cui, Qihan He, Zongyan Lv, Xuexiu Zhong

Abstract: In the present paper, we study the existence of normalized solutions to the following Kirchhoff type equations \begin{equation*} -\left(a+b\int_{\R^3}|\nabla u|^2\right)Δu+V(x)u+λu=g(u)~\hbox{in}~\R^3 \end{equation*} satisfying the normalized constraint $\displaystyle\int_{\R^3}u^2=c$, where $a,b,c>0$ are prescribed constants, and the nonlinearities $g(u)$ are very general and of mass super-critic… ▽ More In the present paper, we study the existence of normalized solutions to the following Kirchhoff type equations \begin{equation*} -\left(a+b\int_{\R^3}|\nabla u|^2\right)Δu+V(x)u+λu=g(u)~\hbox{in}~\R^3 \end{equation*} satisfying the normalized constraint $\displaystyle\int_{\R^3}u^2=c$, where $a,b,c>0$ are prescribed constants, and the nonlinearities $g(u)$ are very general and of mass super-critical. Under some suitable assumptions on $V(x)$ and $g(u)$, we can prove the existence of ground state normalized solutions $(u_c, λ_c)\in H^1(\R^3)\times\mathbb{R}$, for any given $c>0$. Due to the presence of the nonlocal term, the weak limit $u$ of any $(PS)_C$ sequence $\{w_n\}$ may not belong to the corresponding Pohozaev manifold, which is different from the local problem. So we have to overcome some new difficulties to gain the compactness of a $(PS)_C$ sequence. △ Less

Submitted 14 April, 2023; originally announced April 2023.

Comments: 21 pages

arXiv:2304.03501 [pdf, other]

Continuous Input Embedding Size Search For Recommender Systems

Authors: Yunke Qu, Tong Chen, Xiangyu Zhao, Lizhen Cui, Kai Zheng, Hongzhi Yin

Abstract: Latent factor models are the most popular backbones for today's recommender systems owing to their prominent performance. Latent factor models represent users and items as real-valued embedding vectors for pairwise similarity computation, and all embeddings are traditionally restricted to a uniform size that is relatively large (e.g., 256-dimensional). With the exponentially expanding user base an… ▽ More Latent factor models are the most popular backbones for today's recommender systems owing to their prominent performance. Latent factor models represent users and items as real-valued embedding vectors for pairwise similarity computation, and all embeddings are traditionally restricted to a uniform size that is relatively large (e.g., 256-dimensional). With the exponentially expanding user base and item catalog in contemporary e-commerce, this design is admittedly becoming memory-inefficient. To facilitate lightweight recommendation, reinforcement learning (RL) has recently opened up opportunities for identifying varying embedding sizes for different users/items. However, challenged by search efficiency and learning an optimal RL policy, existing RL-based methods are restricted to highly discrete, predefined embedding size choices. This leads to a largely overlooked potential of introducing finer granularity into embedding sizes to obtain better recommendation effectiveness under a given memory budget. In this paper, we propose continuous input embedding size search (CIESS), a novel RL-based method that operates on a continuous search space with arbitrary embedding sizes to choose from. In CIESS, we further present an innovative random walk-based exploration strategy to allow the RL policy to efficiently explore more candidate embedding sizes and converge to a better decision. CIESS is also model-agnostic and hence generalizable to a variety of latent factor RSs, whilst experiments on two real-world datasets have shown state-of-the-art performance of CIESS under different memory budgets when paired with three popular recommendation models. △ Less

Submitted 7 March, 2024; v1 submitted 7 April, 2023; originally announced April 2023.

Comments: To appear in SIGIR'23

arXiv:2304.01612 [pdf, other]

EDeR: A Dataset for Exploring Dependency Relations Between Events

Authors: Ruiqi Li, Patrik Haslum, Leyang Cui

Abstract: Relation extraction is a central task in natural language processing (NLP) and information retrieval (IR) research. We argue that an important type of relation not explored in NLP or IR research to date is that of an event being an argument - required or optional - of another event. We introduce the human-annotated Event Dependency Relation dataset (EDeR) which provides this dependency relation. T… ▽ More Relation extraction is a central task in natural language processing (NLP) and information retrieval (IR) research. We argue that an important type of relation not explored in NLP or IR research to date is that of an event being an argument - required or optional - of another event. We introduce the human-annotated Event Dependency Relation dataset (EDeR) which provides this dependency relation. The annotation is done on a sample of documents from the OntoNotes dataset, which has the added benefit that it integrates with existing, orthogonal, annotations of this dataset. We investigate baseline approaches for predicting the event dependency relation, the best of which achieves an accuracy of 82.61 for binary argument/non-argument classification. We show that recognizing this relation leads to more accurate event extraction (semantic role labelling) and can improve downstream tasks that depend on this, such as co-reference resolution. Furthermore, we demonstrate that predicting the three-way classification into the required argument, optional argument or non-argument is a more challenging task. △ Less

Submitted 4 April, 2023; originally announced April 2023.

arXiv:2303.11991 [pdf, other]

doi 10.1145/3543873.3587601

Application of an ontology for model cards to generate computable artifacts for linking machine learning information from biomedical research

Authors: Muhammad Amith, Licong Cui, Kirk Roberts, Cui Tao

Abstract: Model card reports provide a transparent description of machine learning models which includes information about their evaluation, limitations, intended use, etc. Federal health agencies have expressed an interest in model cards report for research studies using machine-learning based AI. Previously, we have developed an ontology model for model card reports to structure and formalize these report… ▽ More Model card reports provide a transparent description of machine learning models which includes information about their evaluation, limitations, intended use, etc. Federal health agencies have expressed an interest in model cards report for research studies using machine-learning based AI. Previously, we have developed an ontology model for model card reports to structure and formalize these reports. In this paper, we demonstrate a Java-based library (OWL API, FaCT++) that leverages our ontology to publish computable model card reports. We discuss future directions and other use cases that highlight applicability and feasibility of ontology-driven systems to support FAIR challenges. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Journal ref: Companion Proceedings of the ACM Web Conference 2023

Showing 51–100 of 375 results for author: Cui, L