Search | arXiv e-print repository

doi 10.1002/lpor.202301351

Subwavelength Photorefractive Grating in a Thin-Film Lithium Niobate Microcavity

Authors: Jiankun Hou, Jiefu Zhu, Ruixin Ma, Boyi Xue, Yicheng Zhu, Jintian Lin, Xiaoshun Jiang, Xianfeng Chen, Ya Cheng, Li Ge, Yuanlin Zheng, Wenjie Wan

Abstract: Subwavelength gratings play a fundamental and pivotal role in numerous science and applications for wave manipulation, exhibiting distinctive features such as filtering, phase manipulation, and anti-reflection. However, conventional fabrication methods for ultrasmall periodic structures are constrained by the fundamental optical diffraction limit, making it challenging to produce subwavelength gra… ▽ More Subwavelength gratings play a fundamental and pivotal role in numerous science and applications for wave manipulation, exhibiting distinctive features such as filtering, phase manipulation, and anti-reflection. However, conventional fabrication methods for ultrasmall periodic structures are constrained by the fundamental optical diffraction limit, making it challenging to produce subwavelength gratings for optics. Here, we demonstrate a novel technique to build a reconfigurable subwavelength photorefractive grating (SPG) in a thin-film lithium niobate on the platform of an optical microcavity. Such SPGs are optically induced through the photorefractive effect and the subwavelength features originate from the spatial phase modulations of the pump's standing wave. The resulting SPGs lead to the mode splitting of two counter-propagating modes inside the microcavity, exhibiting an Electromagnetically Induced Transparency (EIT)-like transmission spectrum. Moreover, the unique subwavelength characteristic of SPGs enables first-order quasi-phase-matching for backward second-harmonic generation, a long-standing problem in nonlinear optics. Also, free-space-to-chip vertical nonlinear frequency conversion can be achieved in a similar manner. These results provide a flexible approach towards fabricating subwavelength gratings, which holds significant potential in various applications such as nonlinear frequency conversion, optical communication, sensing, and quantum technologies. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.07792 [pdf, other]

Empowering Federated Learning for Massive Models with NVIDIA FLARE

Authors: Holger R. Roth, Ziyue Xu, Yuan-Ting Hsieh, Adithya Renduchintala, Isaac Yang, Zhihong Zhang, Yuhong Wen, Sean Yang, Kevin Lu, Kristopher Kersten, Camir Ricketts, Daguang Xu, Chester Chen, Yan Cheng, Andrew Feng

Abstract: In the ever-evolving landscape of artificial intelligence (AI) and large language models (LLMs), handling and leveraging data effectively has become a critical challenge. Most state-of-the-art machine learning algorithms are data-centric. However, as the lifeblood of model performance, necessary data cannot always be centralized due to various factors such as privacy, regulation, geopolitics, copy… ▽ More In the ever-evolving landscape of artificial intelligence (AI) and large language models (LLMs), handling and leveraging data effectively has become a critical challenge. Most state-of-the-art machine learning algorithms are data-centric. However, as the lifeblood of model performance, necessary data cannot always be centralized due to various factors such as privacy, regulation, geopolitics, copyright issues, and the sheer effort required to move vast datasets. In this paper, we explore how federated learning enabled by NVIDIA FLARE can address these challenges with easy and scalable integration capabilities, enabling parameter-efficient and full supervised fine-tuning of LLMs for natural language processing and biopharmaceutical applications to enhance their accuracy and robustness. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.06200 [pdf]

doi 10.1103/PhysRevLett.132.256902

Enhanced Frequency Conversion in Parity-Time Symmetry Line

Authors: Jiankun Hou, Jiefu Zhu, Ruixin Ma, Boyi Xue, Yicheng Zhu, Jintian Lin, Xiaoshun Jiang, Yuanlin Zheng, Xianfeng Chen, Ya Cheng, Li Ge, Wenjie Wan

Abstract: Non-Hermitian degeneracies reveal intriguing and non-trivial behaviors in open physical systems. Examples like Parity-Time (PT) symmetry breaking, topological encircling chirality, and enhanced sensing near an exceptional point (EP) are often associated with the abrupt nature of the phase transition around these degeneracies. Here we experimentally observe a cavity-enhanced second-harmonic frequen… ▽ More Non-Hermitian degeneracies reveal intriguing and non-trivial behaviors in open physical systems. Examples like Parity-Time (PT) symmetry breaking, topological encircling chirality, and enhanced sensing near an exceptional point (EP) are often associated with the abrupt nature of the phase transition around these degeneracies. Here we experimentally observe a cavity-enhanced second-harmonic frequency (SHG) conversion on a PT symmetry line, i.e. a set consisting of open-ended isofrequency or isoloss lines, both terminated at EPs on the Riemann surface in parameter space. The enhancement factor can reach as high as 300, depending on the crossing point whether in the symmetry or the broken phase of the PT line. Moreover, such enhancement of SHG enables sensitive distance sensing with a nanometer resolution. Our works may pave the way for practical applications in sensing, frequency conversion, and coherent wave control. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.05956 [pdf, other]

Pathformer: Multi-scale Transformers with Adaptive Pathways for Time Series Forecasting

Authors: Peng Chen, Yingying Zhang, Yunyao Cheng, Yang Shu, Yihang Wang, Qingsong Wen, Bin Yang, Chenjuan Guo

Abstract: Transformers for time series forecasting mainly model time series from limited or fixed scales, making it challenging to capture different characteristics spanning various scales. We propose Pathformer, a multi-scale Transformer with adaptive pathways. It integrates both temporal resolution and temporal distance for multi-scale modeling. Multi-scale division divides the time series into different… ▽ More Transformers for time series forecasting mainly model time series from limited or fixed scales, making it challenging to capture different characteristics spanning various scales. We propose Pathformer, a multi-scale Transformer with adaptive pathways. It integrates both temporal resolution and temporal distance for multi-scale modeling. Multi-scale division divides the time series into different temporal resolutions using patches of various sizes. Based on the division of each scale, dual attention is performed over these patches to capture global correlations and local details as temporal dependencies. We further enrich the multi-scale Transformer with adaptive pathways, which adaptively adjust the multi-scale modeling process based on the varying temporal dynamics of the input, improving the accuracy and generalization of Pathformer. Extensive experiments on eleven real-world datasets demonstrate that Pathformer not only achieves state-of-the-art performance by surpassing all current models but also exhibits stronger generalization abilities under various transfer scenarios. The code is made available at https://github.com/decisionintelligence/pathformer. △ Less

Submitted 6 March, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

Comments: Accepted by the 12th International Conference on Learning Representations (ICLR 2024)

arXiv:2402.05383 [pdf, other]

First measurement of the yield of $^8$He isotopes produced in liquid scintillator by cosmic-ray muons at Daya Bay

Authors: Daya Bay Collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, Y. C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng, X. Y. Ding , et al. (177 additional authors not shown)

Abstract: Daya Bay presents the first measurement of cosmogenic $^8$He isotope production in liquid scintillator, using an innovative method for identifying cascade decays of $^8$He and its child isotope, $^8$Li. We also measure the production yield of $^9$Li isotopes using well-established methodology. The results, in units of 10$^{-8}μ^{-1}$g$^{-1}$cm$^{2}$, are 0.307$\pm$0.042, 0.341$\pm$0.040, and 0.546… ▽ More Daya Bay presents the first measurement of cosmogenic $^8$He isotope production in liquid scintillator, using an innovative method for identifying cascade decays of $^8$He and its child isotope, $^8$Li. We also measure the production yield of $^9$Li isotopes using well-established methodology. The results, in units of 10$^{-8}μ^{-1}$g$^{-1}$cm$^{2}$, are 0.307$\pm$0.042, 0.341$\pm$0.040, and 0.546$\pm$0.076 for $^8$He, and 6.73$\pm$0.73, 6.75$\pm$0.70, and 13.74$\pm$0.82 for $^9$Li at average muon energies of 63.9~GeV, 64.7~GeV, and 143.0~GeV, respectively. The measured production rate of $^8$He isotopes is more than an order of magnitude lower than any other measurement of cosmogenic isotope production. It replaces the results of previous attempts to determine the ratio of $^8$He to $^9$Li production that yielded a wide range of limits from 0 to 30\%. The results provide future liquid-scintillator-based experiments with improved ability to predict cosmogenic backgrounds. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.05347 [pdf, ps, other]

Robust Implicit Adaptive Low Rank Time-Stepping Methods for Matrix Differential Equations

Authors: Daniel Appelö, Yingda Cheng

Abstract: In this work, we develop implicit rank-adaptive schemes for time-dependent matrix differential equations. The dynamic low rank approximation (DLRA) is a well-known technique to capture the dynamic low rank structure based on Dirac-Frenkel time-dependent variational principle. In recent years, it has attracted a lot of attention due to its wide applicability. Our schemes are inspired by the three-s… ▽ More In this work, we develop implicit rank-adaptive schemes for time-dependent matrix differential equations. The dynamic low rank approximation (DLRA) is a well-known technique to capture the dynamic low rank structure based on Dirac-Frenkel time-dependent variational principle. In recent years, it has attracted a lot of attention due to its wide applicability. Our schemes are inspired by the three-step procedure used in the rank adaptive version of the unconventional robust integrator (the so called BUG integrator) for DLRA. First, a prediction (basis update) step is made computing the approximate column and row spaces at the next time level. Second, a Galerkin evolution step is invoked using a base implicit solve for the small core matrix. Finally, a truncation is made according to a prescribed error threshold. Since the DLRA is evolving the differential equation projected on to the tangent space of the low rank manifold, the error estimate of the BUG integrator contains the tangent projection (modeling) error which cannot be easily controlled by mesh refinement. This can cause convergence issue for equations with cross terms. To address this issue, we propose a simple modification, consisting of merging the row and column spaces from the explicit step truncation method together with the BUG spaces in the prediction step. In addition, we propose an adaptive strategy where the BUG spaces are only computed if the residual for the solution obtained from the prediction space by explicit step truncation method, is too large. We prove stability and estimate the local truncation error of the schemes under assumptions. We benchmark the schemes in several tests, such as anisotropic diffusion, solid body rotation and the combination of the two, to show robust convergence properties. △ Less

Submitted 17 March, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

MSC Class: 65

arXiv:2402.04195 [pdf, other]

Instance by Instance: An Iterative Framework for Multi-instance 3D Registration

Authors: Xinyue Cao, Xiyu Zhang, Yuxin Cheng, Zhaoshuai Qi, Yanning Zhang, Jiaqi Yang

Abstract: Multi-instance registration is a challenging problem in computer vision and robotics, where multiple instances of an object need to be registered in a standard coordinate system. In this work, we propose the first iterative framework called instance-by-instance (IBI) for multi-instance 3D registration (MI-3DReg). It successively registers all instances in a given scenario, starting from the easies… ▽ More Multi-instance registration is a challenging problem in computer vision and robotics, where multiple instances of an object need to be registered in a standard coordinate system. In this work, we propose the first iterative framework called instance-by-instance (IBI) for multi-instance 3D registration (MI-3DReg). It successively registers all instances in a given scenario, starting from the easiest and progressing to more challenging ones. Throughout the iterative process, outliers are eliminated continuously, leading to an increasing inlier rate for the remaining and more challenging instances. Under the IBI framework, we further propose a sparse-to-dense-correspondence-based multi-instance registration method (IBI-S2DC) to achieve robust MI-3DReg. Experiments on the synthetic and real datasets have demonstrated the effectiveness of IBI and suggested the new state-of-the-art performance of IBI-S2DC, e.g., our MHF1 is 12.02%/12.35% higher than the existing state-of-the-art method ECC on the synthetic/real datasets. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 14 pages, 12 figures, 10 tables

arXiv:2402.02700 [pdf, ps, other]

Sample Complexity Characterization for Linear Contextual MDPs

Authors: Junze Deng, Yuan Cheng, Shaofeng Zou, Yingbin Liang

Abstract: Contextual Markov decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable. While CMDPs serve as an important framework to model many real-world applications with time-varying environments, they are largely unexplored from theoretical perspective. In thi… ▽ More Contextual Markov decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable. While CMDPs serve as an important framework to model many real-world applications with time-varying environments, they are largely unexplored from theoretical perspective. In this paper, we study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights. For both models, we propose novel model-based algorithms and show that they enjoy guaranteed $ε$-suboptimality gap with desired polynomial sample complexity. In particular, instantiating our result for the first model to the tabular CMDP improves the existing result by removing the reachability assumption. Our result for the second model is the first-known result for such a type of function approximation models. Comparison between our results for the two models further indicates that having context-varying features leads to much better sample efficiency than having common representations for all contexts under linear CMDPs. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: accepted to AIstats2024

arXiv:2402.02334 [pdf, other]

Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning

Authors: Yi Cheng, Renjun Hu, Haochao Ying, Xing Shi, Jian Wu, Wei Lin

Abstract: Until recently, the question of the effective inductive bias of deep models on tabular data has remained unanswered. This paper investigates the hypothesis that arithmetic feature interaction is necessary for deep tabular learning. To test this point, we create a synthetic tabular dataset with a mild feature interaction assumption and examine a modified transformer architecture enabling arithmetic… ▽ More Until recently, the question of the effective inductive bias of deep models on tabular data has remained unanswered. This paper investigates the hypothesis that arithmetic feature interaction is necessary for deep tabular learning. To test this point, we create a synthetic tabular dataset with a mild feature interaction assumption and examine a modified transformer architecture enabling arithmetical feature interactions, referred to as AMFormer. Results show that AMFormer outperforms strong counterparts in fine-grained tabular data modeling, data efficiency in training, and generalization. This is attributed to its parallel additive and multiplicative attention operators and prompt-based optimization, which facilitate the separation of tabular samples in an extended space with arithmetically-engineered features. Our extensive experiments on real-world data also validate the consistent effectiveness, efficiency, and rationale of AMFormer, suggesting it has established a strong inductive bias for deep learning on tabular data. Code is available at https://github.com/aigc-apps/AMFormer. △ Less

Submitted 19 March, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

Comments: 11 pages, 8 figures, to be published to AAAI2024

ACM Class: I.2.4

arXiv:2402.01220 [pdf, other]

Delving into Decision-based Black-box Attacks on Semantic Segmentation

Authors: Zhaoyu Chen, Zhengyang Shan, Jingwen Chang, Kaixun Jiang, Dingkang Yang, Yiting Cheng, Wenqiang Zhang

Abstract: Semantic segmentation is a fundamental visual task that finds extensive deployment in applications with security-sensitive considerations. Nonetheless, recent work illustrates the adversarial vulnerability of semantic segmentation models to white-box attacks. However, its adversarial robustness against black-box attacks has not been fully explored. In this paper, we present the first exploration o… ▽ More Semantic segmentation is a fundamental visual task that finds extensive deployment in applications with security-sensitive considerations. Nonetheless, recent work illustrates the adversarial vulnerability of semantic segmentation models to white-box attacks. However, its adversarial robustness against black-box attacks has not been fully explored. In this paper, we present the first exploration of black-box decision-based attacks on semantic segmentation. First, we analyze the challenges that semantic segmentation brings to decision-based attacks through the case study. Then, to address these challenges, we first propose a decision-based attack on semantic segmentation, called Discrete Linear Attack (DLA). Based on random search and proxy index, we utilize the discrete linear noises for perturbation exploration and calibration to achieve efficient attack efficiency. We conduct adversarial robustness evaluation on 5 models from Cityscapes and ADE20K under 8 attacks. DLA shows its formidable power on Cityscapes by dramatically reducing PSPNet's mIoU from an impressive 77.83% to a mere 2.14% with just 50 queries. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2402.00036 [pdf, other]

Kronecker Product Feature Fusion for Convolutional Neural Network in Remote Sensing Scene Classification

Authors: Yinzhu Cheng

Abstract: Remote Sensing Scene Classification is a challenging and valuable research topic, in which Convolutional Neural Network (CNN) has played a crucial role. CNN can extract hierarchical convolutional features from remote sensing imagery, and Feature Fusion of different layers can enhance CNN's performance. Two successful Feature Fusion methods, Add and Concat, are employed in certain state-of-the-art… ▽ More Remote Sensing Scene Classification is a challenging and valuable research topic, in which Convolutional Neural Network (CNN) has played a crucial role. CNN can extract hierarchical convolutional features from remote sensing imagery, and Feature Fusion of different layers can enhance CNN's performance. Two successful Feature Fusion methods, Add and Concat, are employed in certain state-of-the-art CNN algorithms. In this paper, we propose a novel Feature Fusion algorithm, which unifies the aforementioned methods using the Kronecker Product (KPFF), and we discuss the Backpropagation procedure associated with this algorithm. To validate the efficacy of the proposed method, a series of experiments are designed and conducted. The results demonstrate its effectiveness of enhancing CNN's accuracy in Remote sensing scene classification. △ Less

Submitted 8 January, 2024; originally announced February 2024.

arXiv:2402.00033 [pdf, other]

LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image Recognition

Authors: Youbing Hu, Yun Cheng, Anqi Lu, Zhiqiang Cao, Dawei Wei, Jie Liu, Zhijun Li

Abstract: The Vision Transformer (ViT) excels in accuracy when handling high-resolution images, yet it confronts the challenge of significant spatial redundancy, leading to increased computational and memory requirements. To address this, we present the Localization and Focus Vision Transformer (LF-ViT). This model operates by strategically curtailing computational demands without impinging on performance.… ▽ More The Vision Transformer (ViT) excels in accuracy when handling high-resolution images, yet it confronts the challenge of significant spatial redundancy, leading to increased computational and memory requirements. To address this, we present the Localization and Focus Vision Transformer (LF-ViT). This model operates by strategically curtailing computational demands without impinging on performance. In the Localization phase, a reduced-resolution image is processed; if a definitive prediction remains elusive, our pioneering Neighborhood Global Class Attention (NGCA) mechanism is triggered, effectively identifying and spotlighting class-discriminative regions based on initial findings. Subsequently, in the Focus phase, this designated region is used from the original image to enhance recognition. Uniquely, LF-ViT employs consistent parameters across both phases, ensuring seamless end-to-end optimization. Our empirical tests affirm LF-ViT's prowess: it remarkably decreases Deit-S's FLOPs by 63\% and concurrently amplifies throughput twofold. Code of this project is at https://github.com/edgeai1/LF-ViT.git. △ Less

Submitted 7 January, 2024; originally announced February 2024.

arXiv:2401.17992 [pdf, other]

Multilinear Operator Networks

Authors: Yixin Cheng, Grigorios G. Chrysos, Markos Georgopoulos, Volkan Cevher

Abstract: Despite the remarkable capabilities of deep neural networks in image recognition, the dependence on activation functions remains a largely unexplored area and has yet to be eliminated. On the other hand, Polynomial Networks is a class of models that does not require activation functions, but have yet to perform on par with modern architectures. In this work, we aim close this gap and propose MONet… ▽ More Despite the remarkable capabilities of deep neural networks in image recognition, the dependence on activation functions remains a largely unexplored area and has yet to be eliminated. On the other hand, Polynomial Networks is a class of models that does not require activation functions, but have yet to perform on par with modern architectures. In this work, we aim close this gap and propose MONet, which relies solely on multilinear operators. The core layer of MONet, called Mu-Layer, captures multiplicative interactions of the elements of the input token. MONet captures high-degree interactions of the input elements and we demonstrate the efficacy of our approach on a series of image recognition and scientific computing benchmarks. The proposed model outperforms prior polynomial networks and performs on par with modern architectures. We believe that MONet can inspire further research on models that use entirely multilinear operations. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: International Conference on Learning Representations Poster(2024)

arXiv:2401.17773 [pdf, other]

doi 10.1109/TCSVT.2023.3303945

SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text Tasks

Authors: Xingning Dong, Qingpei Guo, Tian Gan, Qing Wang, Jianlong Wu, Xiangyuan Ren, Yuan Cheng, Wei Chu

Abstract: We present a framework for learning cross-modal video representations by directly pre-training on raw data to facilitate various downstream video-text tasks. Our main contributions lie in the pre-training framework and proxy tasks. First, based on the shortcomings of two mainstream pixel-level pre-training architectures (limited applications or less efficient), we propose Shared Network Pre-traini… ▽ More We present a framework for learning cross-modal video representations by directly pre-training on raw data to facilitate various downstream video-text tasks. Our main contributions lie in the pre-training framework and proxy tasks. First, based on the shortcomings of two mainstream pixel-level pre-training architectures (limited applications or less efficient), we propose Shared Network Pre-training (SNP). By employing one shared BERT-type network to refine textual and cross-modal features simultaneously, SNP is lightweight and could support various downstream applications. Second, based on the intuition that people always pay attention to several "significant words" when understanding a sentence, we propose the Significant Semantic Strengthening (S3) strategy, which includes a novel masking and matching proxy task to promote the pre-training performance. Experiments conducted on three downstream video-text tasks and six datasets demonstrate that, we establish a new state-of-the-art in pixel-level video-text pre-training; we also achieve a satisfactory balance between the pre-training efficiency and the fine-tuning performance. The codebase are available at https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/snps3_vtp. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: Accepted by TCSVT (IEEE Transactions on Circuits and Systems for Video Technology)

arXiv:2401.17475 [pdf, other]

A Dirac-type theorem for arbitrary Hamiltonian $H$-linked digraphs

Authors: Zhilan Wang, Jin Yan, Yangyang Cheng

Abstract: Given any digraph $D$, let $\mathcal{P}(D)$ be the family of all directed paths in $D$, and let $H$ be a digraph with the arc set $A(H)=\{a_1, \ldots, a_k\}$. The digraph $D$ is called arbitrary Hamiltonian $H$-linked if for any injective mapping $f: V(H)\rightarrow V(D)$ and any integer set $\mathcal{N}=\{n_1, \ldots, n_k\}$ with $n_i\geq4$ for each $i\in\{1, \ldots, k\}$, there exists a mapping… ▽ More Given any digraph $D$, let $\mathcal{P}(D)$ be the family of all directed paths in $D$, and let $H$ be a digraph with the arc set $A(H)=\{a_1, \ldots, a_k\}$. The digraph $D$ is called arbitrary Hamiltonian $H$-linked if for any injective mapping $f: V(H)\rightarrow V(D)$ and any integer set $\mathcal{N}=\{n_1, \ldots, n_k\}$ with $n_i\geq4$ for each $i\in\{1, \ldots, k\}$, there exists a mapping $g: A(H)\rightarrow \mathcal{P}(D)$ such that for every arc $a_i=uv$, $g(a_i)$ is a directed path from $f(u)$ to $f(v)$ of length $n_i$, and different arcs are mapped into internally vertex-disjoint directed paths in $D$, and $\bigcup_{i\in[k]}V(g(a_i))=V(D)$. In this paper, we prove that for any digraph $H$ with $k$ arcs and $δ(H)\geq1$, every digraph of sufficiently large order $n$ with minimum in- and out-degree at least $n/2+k$ is arbitrary Hamiltonian $H$-linked. Furthermore, we show that the lower bound is best possible. Our main result extends some work of Kühn and Osthus et al. \cite{20081,20082} and Ferrara, Jacobson and Pfender \cite{Jacobson}. Besides, as a corollary of our main theorem, we solve a conjecture of Wang \cite{Wang} for sufficiently large graphs. △ Less

Submitted 30 January, 2024; originally announced January 2024.

MSC Class: 05C20; 05C70; 05C07

arXiv:2401.16402 [pdf, other]

A Survey on Visual Anomaly Detection: Challenge, Approach, and Prospect

Authors: Yunkang Cao, Xiaohao Xu, Jiangning Zhang, Yuqi Cheng, Xiaonan Huang, Guansong Pang, Weiming Shen

Abstract: Visual Anomaly Detection (VAD) endeavors to pinpoint deviations from the concept of normality in visual data, widely applied across diverse domains, e.g., industrial defect inspection, and medical lesion detection. This survey comprehensively examines recent advancements in VAD by identifying three primary challenges: 1) scarcity of training data, 2) diversity of visual modalities, and 3) complexi… ▽ More Visual Anomaly Detection (VAD) endeavors to pinpoint deviations from the concept of normality in visual data, widely applied across diverse domains, e.g., industrial defect inspection, and medical lesion detection. This survey comprehensively examines recent advancements in VAD by identifying three primary challenges: 1) scarcity of training data, 2) diversity of visual modalities, and 3) complexity of hierarchical anomalies. Starting with a brief overview of the VAD background and its generic concept definitions, we progressively categorize, emphasize, and discuss the latest VAD progress from the perspective of sample number, data modality, and anomaly hierarchy. Through an in-depth analysis of the VAD field, we finally summarize future developments for VAD and conclude the key findings and contributions of this survey. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: Work in progress. Yunkang Cao, Xiaohao Xu, and Jiangning Zhang contribute equally to this work

arXiv:2401.15287 [pdf, other]

Applications of Tao General Difference in Discrete Domain

Authors: Linmi Tao, Ruiyang Liu, Donglai Tao, Wu Xia, Feilong Ma, Yu Cheng, Jingmao Cui

Abstract: Numerical difference computation is one of the cores and indispensable in the modern digital era. Tao general difference (TGD) is a novel theory and approach to difference computation for discrete sequences and arrays in multidimensional space. Built on the solid theoretical foundation of the general difference in a finite interval, the TGD operators demonstrate exceptional signal processing capab… ▽ More Numerical difference computation is one of the cores and indispensable in the modern digital era. Tao general difference (TGD) is a novel theory and approach to difference computation for discrete sequences and arrays in multidimensional space. Built on the solid theoretical foundation of the general difference in a finite interval, the TGD operators demonstrate exceptional signal processing capabilities in real-world applications. A novel smoothness property of a sequence is defined on the first- and second TGD. This property is used to denoise one-dimensional signals, where the noise is the non-smooth points in the sequence. Meanwhile, the center of the gradient in a finite interval can be accurately location via TGD calculation. This solves a traditional challenge in computer vision, which is the precise localization of image edges with noise robustness. Furthermore, the power of TGD operators extends to spatio-temporal edge detection in three-dimensional arrays, enabling the identification of kinetic edges in video data. These diverse applications highlight the properties of TGD in discrete domain and the significant promise of TGD for the computation across signal processing, image analysis, and video analytic. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: This paper is the application part of the paper "Tao General Differential and Difference: Theory and Application". The theory part of the paper is renamed as "A Theory of General Difference in Continuous and Discrete Domain", which is Arxived in arXiv:2305.08098v2

arXiv:2401.12961 [pdf, other]

doi 10.1145/3672198.3673797

Eloquent: A More Robust Transmission Scheme for LLM Token Streaming

Authors: Hanchen Li, Yuhan Liu, Yihua Cheng, Siddhant Ray, Kuntai Du, Junchen Jiang

Abstract: To render each generated token in real-time for users, the Large Language Model (LLM) server generates tokens one by one and streams each token (or group of a few tokens) through the network to the user right after generation, which we refer to as LLM token streaming. However, under unstable network conditions, the LLM token streaming experience could suffer greatly from stalls since one packet lo… ▽ More To render each generated token in real-time for users, the Large Language Model (LLM) server generates tokens one by one and streams each token (or group of a few tokens) through the network to the user right after generation, which we refer to as LLM token streaming. However, under unstable network conditions, the LLM token streaming experience could suffer greatly from stalls since one packet loss could block the rendering of later tokens even if the packets containing them arrive on time. With a measurement study, we show that current applications suffer from increased stalls under unstable networks. For this emerging token streaming problem in LLM Chatbots that differs from previous multimedia and text applications, we propose a novel transmission scheme, called Eloquent, which puts newly generated tokens as well as currently unacknowledged tokens in the next outgoing packet. This ensures that each packet contains some new tokens and, in the meantime, is independently rendered when received, avoiding the aforementioned stalls caused by missing packets. Through simulation under various networks, we show Eloquent reduces stall ratio (proportion of token rendering wait time) by 71.0% compared to the retransmission method commonly used by real chatbot applications and by 31.6% compared to the baseline packet duplication scheme. By tailoring Eloquent to fit the token-by-token generation of LLM, we enable the Chatbots to respond like an eloquent speaker for users to better enjoy pervasive AI. △ Less

Submitted 16 June, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

Comments: In SIGCOMM Workshop on Networks for AI Computing (NAIC '24)

arXiv:2401.12920 [pdf, other]

Truck Parking Usage Prediction with Decomposed Graph Neural Networks

Authors: Rei Tamaru, Yang Cheng, Steven Parker, Ernie Perry, Bin Ran, Soyoung Ahn

Abstract: Truck parking on freight corridors faces the major challenge of insufficient parking spaces. This is exacerbated by the Hour-of-Service (HOS) regulations, which often result in unauthorized parking practices, causing safety concerns. It has been shown that providing accurate parking usage prediction can be a cost-effective solution to reduce unsafe parking practices. In light of this, existing stu… ▽ More Truck parking on freight corridors faces the major challenge of insufficient parking spaces. This is exacerbated by the Hour-of-Service (HOS) regulations, which often result in unauthorized parking practices, causing safety concerns. It has been shown that providing accurate parking usage prediction can be a cost-effective solution to reduce unsafe parking practices. In light of this, existing studies have developed various methods to predict the usage of a truck parking site and have demonstrated satisfactory accuracy. However, these studies focus on a single parking site, and few approaches have been proposed to predict the usage of multiple truck parking sites considering spatio-temporal dependencies, due to the lack of data. This paper aims to fill this gap and presents the Regional Temporal Graph Neural Network (RegT-GCN) to predict parking usage across the entire state to provide more comprehensive truck parking information. The framework leverages the topological structures of truck parking site locations and historical parking data to predict the occupancy rate considering spatio-temporal dependencies across a state. To achieve this, we introduce a Regional Decomposition approach, which effectively captures the geographical characteristics of the truck parking locations and their spatial correlations. Evaluation results demonstrate that the proposed model outperforms other baseline models, improving performance by more than 20%. △ Less

Submitted 12 August, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.12531 [pdf, ps, other]

Some reflections on the relationship between logical incompleteness and concrete incompleteness

Authors: Yong Cheng

Abstract: In this paper, we aim to conceptually examine the relationship between logical incompleteness and concrete incompleteness which both study the incompleteness phenomenon. We argue for two main theses. Firstly, the current research on concrete incompleteness reals both similarities and differences between logical incompleteness and concrete incompleteness. Similarities between them are not universal… ▽ More In this paper, we aim to conceptually examine the relationship between logical incompleteness and concrete incompleteness which both study the incompleteness phenomenon. We argue for two main theses. Firstly, the current research on concrete incompleteness reals both similarities and differences between logical incompleteness and concrete incompleteness. Similarities between them are not universal, and differences between them are essential. Secondly, concrete incompleteness is a higher order phenomenon over logical incompleteness. This verifies that Hilbert's concrete and intuitive proof theory provides us essential new information from non-concrete and non-intuitive ideal proofs. We examine similarities between logical incompleteness and concrete incompleteness from two aspects: equivalences between logical incompleteness and concrete incompleteness, and the ubiquity of the incompleteness phenomenon in both logical incompleteness and concrete incompleteness. We examine differences between logical incompleteness and concrete incompleteness from five aspects: (1) the influence on Hilbert's program; (2) properties of independent sentences; (3) the intensionality problem; (4) the relationship with ordinal analysis; (5) the limit of provability. △ Less

Submitted 1 February, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

Comments: 26 pages

MSC Class: 03A05; 00A30; 03-02

arXiv:2401.12319 [pdf, other]

doi 10.3847/1538-4357/ad234a

Exploring the Gas-Phase Metallicity Gradients of Star-forming Galaxies at Cosmic Noon

Authors: Yingjie Cheng, Mauro Giavalisco, Raymond C. Simons, Zhiyuan Ji, Darren Stroupe, Nikko J. Cleri

Abstract: We explore the relationships between the [O/H] gas-phase metallicity radial gradients and multiple galaxy properties for 238 star-forming galaxies at 0.6<z<2.6 selected from the CANDELS Ly$α$ Emission at Reionization (CLEAR) survey with stellar mass 8.5 < log $M_{*}/M_{\odot}$ < 10.5. The gradients cover the range from -0.11 to 0.22 dex kpc$^{-1}$, with the median value close to zero. We reconstru… ▽ More We explore the relationships between the [O/H] gas-phase metallicity radial gradients and multiple galaxy properties for 238 star-forming galaxies at 0.6<z<2.6 selected from the CANDELS Ly$α$ Emission at Reionization (CLEAR) survey with stellar mass 8.5 < log $M_{*}/M_{\odot}$ < 10.5. The gradients cover the range from -0.11 to 0.22 dex kpc$^{-1}$, with the median value close to zero. We reconstruct the nonparametric star-formation histories (SFHs) of the galaxies with spectral energy distribution modeling using Prospector with more than 40 photometric bands from HST, Spitzer and ground-based facilities. In general, we find weak or no correlations between the metallicity gradients and most galaxy properties, including the mass-weighted age, recent star formation rate, dust attenuation, and morphology as quantified by both parametric and non-parametric diagnostics. We find a significant but moderate correlation between the gradients and the 'evolutionary time', a temporal metric that characterizes the evolutionary status of a galaxy, with flatter gradients observed in more evolved galaxies. Also, there is evidence that galaxies with multiple star-formation episodes in their SFHs tend to develop more negative gas-phase metallicity gradients (higher [O/H] at the center). We conclude that gas kinematics, e.g. radial inflows and outflows, is likely an important process in setting the gas-phase metallicity gradients, in addition to the evolution of the SFH radial profile. Since the gradients are largely independent on the galaxies' physical properties, and only weakly dependent on their SFH, it would appear that the timescale of the gas kinematics is significantly shorter than the evolution of star formation. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 22 pages, 21 figures, accepted for publication in APJ

Journal ref: The Astrophysical Journal, 2024, Volume 964, Issue 1, id.94, 17 pp

arXiv:2401.11944 [pdf, other]

CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

Authors: Ge Zhang, Xinrun Du, Bei Chen, Yiming Liang, Tongxu Luo, Tianyu Zheng, Kang Zhu, Yuyang Cheng, Chunpu Xu, Shuyue Guo, Haoran Zhang, Xingwei Qu, Junjie Wang, Ruibin Yuan, Yizhi Li, Zekun Wang, Yudong Liu, Yu-Hsuan Tsai, Fengji Zhang, Chenghua Lin, Wenhao Huang, Wenhu Chen, Jie Fu

Abstract: As the capabilities of large multimodal models (LMMs) continue to advance, evaluating the performance of LMMs emerges as an increasing need. Additionally, there is an even larger gap in evaluating the advanced knowledge and reasoning abilities of LMMs in non-English contexts such as Chinese. We introduce CMMMU, a new Chinese Massive Multi-discipline Multimodal Understanding benchmark designed to e… ▽ More As the capabilities of large multimodal models (LMMs) continue to advance, evaluating the performance of LMMs emerges as an increasing need. Additionally, there is an even larger gap in evaluating the advanced knowledge and reasoning abilities of LMMs in non-English contexts such as Chinese. We introduce CMMMU, a new Chinese Massive Multi-discipline Multimodal Understanding benchmark designed to evaluate LMMs on tasks demanding college-level subject knowledge and deliberate reasoning in a Chinese context. CMMMU is inspired by and strictly follows the annotation and analysis pattern of MMMU. CMMMU includes 12k manually collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering, like its companion, MMMU. These questions span 30 subjects and comprise 39 highly heterogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. CMMMU focuses on complex perception and reasoning with domain-specific knowledge in the Chinese context. We evaluate 11 open-source LLMs and one proprietary GPT-4V(ision). Even GPT-4V only achieves accuracies of 42%, indicating a large space for improvement. CMMMU will boost the community to build the next-generation LMMs towards expert artificial intelligence and promote the democratization of LMMs by providing diverse language contexts. △ Less

Submitted 18 March, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.11749 [pdf, ps, other]

On Rosser theories

Authors: Yong Cheng

Abstract: Rosser theories play an important role in the study of the incompleteness phenomenon and mete-mathematics of arithmetic. In this paper, we first define notions of $n$-Rosser theories, exact $n$-Rosser theories, effectively $n$-Rosser theories and effectively exact $n$-Rosser theories (see Definition 1.6). Our definitions are not restricted to arithmetic languages. Then we systematically examine pr… ▽ More Rosser theories play an important role in the study of the incompleteness phenomenon and mete-mathematics of arithmetic. In this paper, we first define notions of $n$-Rosser theories, exact $n$-Rosser theories, effectively $n$-Rosser theories and effectively exact $n$-Rosser theories (see Definition 1.6). Our definitions are not restricted to arithmetic languages. Then we systematically examine properties of $n$-Rosser theories and relationships among them. Especially, we generalize some important theorems about Rosser theories for RE sets in the literature to $n$-Rosser theories in a general setting. △ Less

Submitted 29 July, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: 25 pages

MSC Class: 03F40; 03F30; 03F25

arXiv:2401.09769 [pdf, other]

Learning from Graphs with Heterophily: Progress and Future

Authors: Chenghua Gong, Yao Cheng, Xiang Li, Caihua Shan, Siqiang Luo

Abstract: Graphs are structured data that models complex relations between real-world entities. Heterophilous graphs, where linked nodes are prone to be with different labels or dissimilar features, have recently attracted significant attention and found many applications. Meanwhile, increasing efforts have been made to advance learning from heterophilous graphs. Although there exist surveys on the relevant… ▽ More Graphs are structured data that models complex relations between real-world entities. Heterophilous graphs, where linked nodes are prone to be with different labels or dissimilar features, have recently attracted significant attention and found many applications. Meanwhile, increasing efforts have been made to advance learning from heterophilous graphs. Although there exist surveys on the relevant topic, they focus on heterophilous GNNs, which are only sub-topics of heterophilous graph learning. In this survey, we comprehensively overview existing works on learning from graphs with heterophily.First, we collect over 180 publications and introduce the development of this field. Then, we systematically categorize existing methods based on a hierarchical taxonomy including learning strategies, model architectures and practical applications. Finally, we discuss the primary challenges of existing studies and highlight promising avenues for future research.More publication details and corresponding open-source codes can be accessed and will be continuously updated at our repositories:https://github.com/gongchenghua/Papers-Graphs-with-Heterophily. △ Less

Submitted 24 July, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.09745 [pdf, other]

Impact of Limited Statistics on the Measured Hyper-Order Cumulants of Net-Proton Distributions in Heavy-Ion Collisions

Authors: Lizhu Chen, Ye-Yin Zhao, Yunshan Cheng, Gang Wang, Zhiming Li, Yuanfang Wu

Abstract: Hyper-order cumulants $C_5/C_1$ and $C_6/C_2$ of net-baryon distributions are anticipated to offer crucial insights into the phase transition from quark-gluon plasma to hadronic matter in heavy-ion collisions. However, the accuracy of $C_5$ and $C_6$ is highly contingent on the fine shape of the distribution's tail, the detectable range of which could be essentially truncated by low statistics. In… ▽ More Hyper-order cumulants $C_5/C_1$ and $C_6/C_2$ of net-baryon distributions are anticipated to offer crucial insights into the phase transition from quark-gluon plasma to hadronic matter in heavy-ion collisions. However, the accuracy of $C_5$ and $C_6$ is highly contingent on the fine shape of the distribution's tail, the detectable range of which could be essentially truncated by low statistics. In this paper, we use the fast Skellam-based simulations, as well as the Ultrarelativistic Quantum Molecular Dynamics model, to assess the impact of limited statistics on the measurements of $C_5/C_1$ and $C_6/C_2$ of net-proton distributions at lower RHIC energies. Both ratios decrease from the unity baseline as we reduce statistics, and could even turn negative without a pertinent physics mechanism. By incorporating statistics akin to experimental data, we can replicate the net-proton $C_5/C_1$ and $C_6/C_2$ values comparable to the corresponding measurements for Au+Au collisions at $\sqrt{s_{NN}} =$ 7.7, 11.5 and 14.5 GeV. Our findings underscore a caveat to the interpretation of the observed beam energy dependence of hyper-order cumulants. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: 6 pages, 7 figures

arXiv:2401.08964 [pdf, other]

Evidence-centered Assessment for Writing with Generative AI

Authors: Yixin Cheng, Kayley Lyons, Guanliang Chen, Dragan Gasevic, Zachari Swiecki

Abstract: We propose a learning analytics-based methodology for assessing the collaborative writing of humans and generative artificial intelligence. Framed by the evidence-centered design, we used elements of knowledge-telling, knowledge transformation, and cognitive presence to identify assessment claims; we used data collected from the CoAuthor writing tool as potential evidence for these claims; and we… ▽ More We propose a learning analytics-based methodology for assessing the collaborative writing of humans and generative artificial intelligence. Framed by the evidence-centered design, we used elements of knowledge-telling, knowledge transformation, and cognitive presence to identify assessment claims; we used data collected from the CoAuthor writing tool as potential evidence for these claims; and we used epistemic network analysis to make inferences from the data about the claims. Our findings revealed significant differences in the writing processes of different groups of CoAuthor users, suggesting that our method is a plausible approach to assessing human-AI collaborative writing. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.08834 [pdf, other]

Structure and lattice excitations of the copper substituted lead oxyapatite Pb$_{9.06(7)}$Cu$_{0.94(6)}$(PO$_{3.92(4)}$)$_{6}$O$_{0.96(3)}$

Authors: Qiang Zhang, Yingdong Guan, Yongqiang Cheng, Lujin Min, Jong K. Keum, Zhiqiang Mao, Matthew B. Stone

Abstract: The copper substituted lead oxyapatite, Pb$_{10-x}$Cu$_{x}$(PO$_{3.92(4)}$)$_{6}$O$_{0.96(3)}$ (x=0.94(6)) was studied using neutron and x-ray diffraction and neutron spectroscopy techniques. The crystal structure of the main phase of our sample, which has come to be colloquially known as LK-99, is verified to possess a hexagonal structure with space group $P 6_{3}/m$, alongside the presence of im… ▽ More The copper substituted lead oxyapatite, Pb$_{10-x}$Cu$_{x}$(PO$_{3.92(4)}$)$_{6}$O$_{0.96(3)}$ (x=0.94(6)) was studied using neutron and x-ray diffraction and neutron spectroscopy techniques. The crystal structure of the main phase of our sample, which has come to be colloquially known as LK-99, is verified to possess a hexagonal structure with space group $P 6_{3}/m$, alongside the presence of impurity phases Cu and Cu$_2$S. We determine the primary substitution location of the Cu as the Pb1 ($6h$) site, with a small substitution at the Pb2 ($4f$) site. Consequently, no clear Cu-doping-induced structural distortion was observed in the investigated temperature region between 10~K and 300~K. Specially, we did not observe a reduction of coordinate number at the Pb2 site or a clear tilting of PO$_4$ tetrahedron. Magnetic characterization reveals a diamagnetic signal in the specimen, accompanied by a very weak ferromagnetic component at 2 K. No long-range magnetic order down to 10 K was detected by the neutron diffraction. Inelastic neutron scattering measurements did not show magnetic excitations for energies up to 350 meV. There is no sign of a superconducting resonance in the excitation spectrum of this material. The measured phonon density of states compares well with density functional theory calculations performed for the main LK-99 phase and its impurity phases. Our study may shed some insight into the role of the favored substitution site of copper in the absence of structural distortion and superconductivity in LK-99. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 11 pages, 8 figures. Physical Review Materials, In press

arXiv:2401.07708 [pdf, other]

Emergent Gauge Theory in Rydberg Atom Arrays

Authors: Yanting Cheng, Hui Zhai

Abstract: Rydberg atom arrays have emerged as a novel platform exhibiting rich quantum many-body physics and offering promise for universal quantum computation. The Rydberg blockade effect plays an essential role in establishing many-body correlations in this system. In this review, we will highlight that the lattice gauge theory is an efficient description of the Rydberg blockade effect and overview recent… ▽ More Rydberg atom arrays have emerged as a novel platform exhibiting rich quantum many-body physics and offering promise for universal quantum computation. The Rydberg blockade effect plays an essential role in establishing many-body correlations in this system. In this review, we will highlight that the lattice gauge theory is an efficient description of the Rydberg blockade effect and overview recent exciting developments in this system from equilibrium phases to quantum dynamics. These developments include realizing exotic ground states such as spin liquids, discovering quantum many-body scar states violating quantum thermalization, and observing confinement-deconfinement transition through quantum dynamics. We emphasize that the gauge theory description offers a universal theoretical framework to capture all these phenomena. This perspective of Rydberg atom arrays will inspire further the future development of quantum simulation and quantum computation in this platform. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: 12 pages, 5 figures

arXiv:2401.07081 [pdf, other]

6Rover: Leveraging Reinforcement Learning-based Address Pattern Mining Approach for Discovering Active Targets in IPv6 Unseeded Space

Authors: Zhichao Zhang, Zhaoxin Zhang, Yanan Cheng, Ning Li

Abstract: The discovery of active IPv6 addresses represents a pivotal challenge in IPv6 network survey, as it is a prerequisite for downstream tasks such as network topology measurements and security analysis. With the rapid spread of IPv6 networks in recent years, many researchers have focused on improving the hit rate, efficiency, and coverage of IPv6 scanning methods, resulting in considerable advancemen… ▽ More The discovery of active IPv6 addresses represents a pivotal challenge in IPv6 network survey, as it is a prerequisite for downstream tasks such as network topology measurements and security analysis. With the rapid spread of IPv6 networks in recent years, many researchers have focused on improving the hit rate, efficiency, and coverage of IPv6 scanning methods, resulting in considerable advancements. However, existing approaches remain heavily dependent on seed addresses, thereby limiting their effectiveness in unseeded prefixes. Consequently, this paper proposes 6Rover, a reinforcement learning-based model for active address discovery in unseeded environments. To overcome the reliance on seeded addresses, 6Rover constructs patterns with higher generality that reflects the actual address allocation strategies of network administrators, thereby avoiding biased transfers of patterns from seeded to unseeded prefixes. After that, 6Rover employs a multi-armed bandit model to optimize the probing resource allocation when applying patterns to unseeded spaces. It models the challenge of discovering optimal patterns in unseeded spaces as an exploration-exploitation dilemma, and progressively uncover the potential patterns applied in unseeded spaces, leading to the efficient discovery of active addresses without seed address as the prior knowledge. Experiments on large-scale unseeded datasets show that 6Rover has a higher hit rate than existing methods in the absence of any seed addresses as prior knowledge. In real network environments, 6Rover achieved a 5% - 8% hit rate in seedless spaces with 100 million budget scale, representing an approximate 200\% improvement over the existing state-of-the-art methods. △ Less

Submitted 13 January, 2024; originally announced January 2024.

arXiv:2401.06541 [pdf, other]

Medical Dialogue Generation via Intuitive-then-Analytical Differential Diagnosis

Authors: Kaishuai Xu, Wenjun Hou, Yi Cheng, Jian Wang, Wenjie Li

Abstract: Medical dialogue systems have attracted growing research attention as they have the potential to provide rapid diagnoses, treatment plans, and health consultations. In medical dialogues, a proper diagnosis is crucial as it establishes the foundation for future consultations. Clinicians typically employ both intuitive and analytic reasoning to formulate a differential diagnosis. This reasoning proc… ▽ More Medical dialogue systems have attracted growing research attention as they have the potential to provide rapid diagnoses, treatment plans, and health consultations. In medical dialogues, a proper diagnosis is crucial as it establishes the foundation for future consultations. Clinicians typically employ both intuitive and analytic reasoning to formulate a differential diagnosis. This reasoning process hypothesizes and verifies a variety of possible diseases and strives to generate a comprehensive and rigorous diagnosis. However, recent studies on medical dialogue generation have overlooked the significance of modeling a differential diagnosis, which hinders the practical application of these systems. To address the above issue, we propose a medical dialogue generation framework with the Intuitive-then-Analytic Differential Diagnosis (IADDx). Our method starts with a differential diagnosis via retrieval-based intuitive association and subsequently refines it through a graph-enhanced analytic procedure. The resulting differential diagnosis is then used to retrieve medical knowledge and guide response generation. Experimental results on two datasets validate the efficacy of our method. Besides, we demonstrate how our framework assists both clinicians and patients in understanding the diagnostic process, for instance, by producing intermediate results and graph-based diagnosis paths. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: Work in progress

arXiv:2401.05970 [pdf]

On-chip wavelength division multiplexing by angled multimode interferometer fabricated on erbium-doped thin film lithium niobate on insulator

Authors: Jinli Han, Rui Bao, Rongbo Wu, Zhaoxiang Liu, Zhe Wang, Chao Sun, Zhihao Zhang, Mengqi Li, Zhiwei Fang, Min Wang, Haisu Zhang, Ya Cheng

Abstract: Photonic integrated circuits based on erbium doped thin film lithium niobate on insulator has attracted broad interests with insofar various waveguide amplifiers and microlasers demonstrated. Wideband operation facilitated by the broadband absorption and emission of erbium ions necessitates the functional integration of wavelength filter and multiplexer on the same chip. Here a low-loss wavelength… ▽ More Photonic integrated circuits based on erbium doped thin film lithium niobate on insulator has attracted broad interests with insofar various waveguide amplifiers and microlasers demonstrated. Wideband operation facilitated by the broadband absorption and emission of erbium ions necessitates the functional integration of wavelength filter and multiplexer on the same chip. Here a low-loss wavelength division multiplexer at the resonant pumping and emission wavelengths (~1480 nm and 1530~1560 nm) of erbium ions based on angled multimode interferometer, is realized in the erbium doped thin film lithium niobate on insulator fabricated by the photolithography assisted chemomechanical etching technique. The minimum on-chip insertion losses of the fabricated device are <0.7 dB for both wavelength ranges, and a 3-dB bandwidth of >20 nm is measured at the telecom C-band. Besides, direct visualization of the multimode interference pattern by the visible upconversion fluorescence of erbium ions compares well with the simulated light propagation in the multimode interferometer. Spectral tuning of the wavelength division multiplexer by structural design is also demonstrated and discussed. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 11 pages, 5 figures

arXiv:2401.05654 [pdf, other]

Towards Conversational Diagnostic AI

Authors: Tao Tu, Anil Palepu, Mike Schaekermann, Khaled Saab, Jan Freyberg, Ryutaro Tanno, Amy Wang, Brenna Li, Mohamed Amin, Nenad Tomasev, Shekoofeh Azizi, Karan Singhal, Yong Cheng, Le Hou, Albert Webson, Kavita Kulkarni, S Sara Mahdavi, Christopher Semturs, Juraj Gottweis, Joelle Barral, Katherine Chou, Greg S Corrado, Yossi Matias, Alan Karthikesalingam, Vivek Natarajan

Abstract: At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introdu… ▽ More At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue. AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 46 pages, 5 figures in main text, 19 figures in appendix

arXiv:2401.05507 [pdf, other]

InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks

Authors: Xueyu Hu, Ziyu Zhao, Shuang Wei, Ziwei Chai, Qianli Ma, Guoyin Wang, Xuwu Wang, Jing Su, Jingjing Xu, Ming Zhu, Yao Cheng, Jianbo Yuan, Jiwei Li, Kun Kuang, Yang Yang, Hongxia Yang, Fei Wu

Abstract: In this paper, we introduce InfiAgent-DABench, the first benchmark specifically designed to evaluate LLM-based agents on data analysis tasks. These tasks require agents to end-to-end solving complex tasks by interacting with an execution environment. This benchmark contains DAEval, a dataset consisting of 257 data analysis questions derived from 52 CSV files, and an agent framework which incorpora… ▽ More In this paper, we introduce InfiAgent-DABench, the first benchmark specifically designed to evaluate LLM-based agents on data analysis tasks. These tasks require agents to end-to-end solving complex tasks by interacting with an execution environment. This benchmark contains DAEval, a dataset consisting of 257 data analysis questions derived from 52 CSV files, and an agent framework which incorporates LLMs to serve as data analysis agents for both serving and evaluation. Since data analysis questions are often open-ended and hard to evaluate without human supervision, we adopt a format-prompting technique to convert each question into a closed-form format so that they can be automatically evaluated. Our extensive benchmarking of 34 LLMs uncovers the current challenges encountered in data analysis tasks. In addition, building on top of our agent framework, we develop a specialized agent, DAAgent, which surpasses GPT-3.5 by 3.9% on DABench. Evaluation datasets and toolkits for InfiAgent-DABench are released at https://github.com/InfiAgent/InfiAgent . △ Less

Submitted 11 March, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: 27 pages, 7 figures, work in progress

arXiv:2401.04354 [pdf, other]

Knowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition

Authors: Xuzheng Yu, Chen Jiang, Wei Zhang, Tian Gan, Linlin Chao, Jianan Zhao, Yuan Cheng, Qingpei Guo, Wei Chu

Abstract: With the explosive growth of video data in real-world applications, a comprehensive representation of videos becomes increasingly important. In this paper, we address the problem of video scene recognition, whose goal is to learn a high-level video representation to classify scenes in videos. Due to the diversity and complexity of video contents in realistic scenarios, this task remains a challeng… ▽ More With the explosive growth of video data in real-world applications, a comprehensive representation of videos becomes increasingly important. In this paper, we address the problem of video scene recognition, whose goal is to learn a high-level video representation to classify scenes in videos. Due to the diversity and complexity of video contents in realistic scenarios, this task remains a challenge. Most existing works identify scenes for videos only from visual or textual information in a temporal perspective, ignoring the valuable information hidden in single frames, while several earlier studies only recognize scenes for separate images in a non-temporal perspective. We argue that these two perspectives are both meaningful for this task and complementary to each other, meanwhile, externally introduced knowledge can also promote the comprehension of videos. We propose a novel two-stream framework to model video representations from multiple perspectives, i.e. temporal and non-temporal perspectives, and integrate the two perspectives in an end-to-end manner by self-distillation. Besides, we design a knowledge-enhanced feature fusion and label prediction method that contributes to naturally introducing knowledge into the task of video scene recognition. Experiments conducted on a real-world dataset demonstrate the effectiveness of our proposed method. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2401.03844 [pdf, other]

Fully Attentional Networks with Self-emerging Token Labeling

Authors: Bingyin Zhao, Zhiding Yu, Shiyi Lan, Yutao Cheng, Anima Anandkumar, Yingjie Lao, Jose M. Alvarez

Abstract: Recent studies indicate that Vision Transformers (ViTs) are robust against out-of-distribution scenarios. In particular, the Fully Attentional Network (FAN) - a family of ViT backbones, has achieved state-of-the-art robustness. In this paper, we revisit the FAN models and improve their pre-training with a self-emerging token labeling (STL) framework. Our method contains a two-stage training framew… ▽ More Recent studies indicate that Vision Transformers (ViTs) are robust against out-of-distribution scenarios. In particular, the Fully Attentional Network (FAN) - a family of ViT backbones, has achieved state-of-the-art robustness. In this paper, we revisit the FAN models and improve their pre-training with a self-emerging token labeling (STL) framework. Our method contains a two-stage training framework. Specifically, we first train a FAN token labeler (FAN-TL) to generate semantically meaningful patch token labels, followed by a FAN student model training stage that uses both the token labels and the original class label. With the proposed STL framework, our best model based on FAN-L-Hybrid (77.3M parameters) achieves 84.8% Top-1 accuracy and 42.1% mCE on ImageNet-1K and ImageNet-C, and sets a new state-of-the-art for ImageNet-A (46.1%) and ImageNet-R (56.6%) without using extra data, outperforming the original FAN counterpart by significant margins. The proposed framework also demonstrates significantly enhanced performance on downstream tasks such as semantic segmentation, with up to 1.7% improvement in robustness over the counterpart model. Code is available at https://github.com/NVlabs/STL. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 5585-5595

arXiv:2401.03476 [pdf, other]

Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness

Authors: Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu

Abstract: Current talking avatars mostly generate co-speech gestures based on audio and text of the utterance, without considering the non-speaking motion of the speaker. Furthermore, previous works on co-speech gesture generation have designed network structures based on individual gesture datasets, which results in limited data volume, compromised generalizability, and restricted speaker movements. To tac… ▽ More Current talking avatars mostly generate co-speech gestures based on audio and text of the utterance, without considering the non-speaking motion of the speaker. Furthermore, previous works on co-speech gesture generation have designed network structures based on individual gesture datasets, which results in limited data volume, compromised generalizability, and restricted speaker movements. To tackle these issues, we introduce FreeTalker, which, to the best of our knowledge, is the first framework for the generation of both spontaneous (e.g., co-speech gesture) and non-spontaneous (e.g., moving around the podium) speaker motions. Specifically, we train a diffusion-based model for speaker motion generation that employs unified representations of both speech-driven gestures and text-driven motions, utilizing heterogeneous data sourced from various motion datasets. During inference, we utilize classifier-free guidance to highly control the style in the clips. Additionally, to create smooth transitions between clips, we utilize DoubleTake, a method that leverages a generative prior and ensures seamless motion blending. Extensive experiments show that our method generates natural and controllable speaker movements. Our code, model, and demo are are available at \url{https://youngseng.github.io/FreeTalker/}. △ Less

Submitted 7 January, 2024; originally announced January 2024.

Comments: 6 pages, 3 figures, ICASSP 2024

arXiv:2401.03428 [pdf, other]

Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects

Authors: Yuheng Cheng, Ceyao Zhang, Zhengwen Zhang, Xiangrui Meng, Sirui Hong, Wenhao Li, Zihao Wang, Zekai Wang, Feng Yin, Junhua Zhao, Xiuqiang He

Abstract: Intelligent agents stand out as a potential path toward artificial general intelligence (AGI). Thus, researchers have dedicated significant effort to diverse implementations for them. Benefiting from recent progress in large language models (LLMs), LLM-based agents that use universal natural language as an interface exhibit robust generalization capabilities across various applications -- from ser… ▽ More Intelligent agents stand out as a potential path toward artificial general intelligence (AGI). Thus, researchers have dedicated significant effort to diverse implementations for them. Benefiting from recent progress in large language models (LLMs), LLM-based agents that use universal natural language as an interface exhibit robust generalization capabilities across various applications -- from serving as autonomous general-purpose task assistants to applications in coding, social, and economic domains, LLM-based agents offer extensive exploration opportunities. This paper surveys current research to provide an in-depth overview of LLM-based intelligent agents within single-agent and multi-agent systems. It covers their definitions, research frameworks, and foundational components such as their composition, cognitive and planning methods, tool utilization, and responses to environmental feedback. We also delve into the mechanisms of deploying LLM-based agents in multi-agent systems, including multi-role collaboration, message passing, and strategies to alleviate communication issues between agents. The discussions also shed light on popular datasets and application scenarios. We conclude by envisioning prospects for LLM-based agents, considering the evolving landscape of AI and natural language processing. △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2401.02901 [pdf, other]

Charged-current non-standard neutrino interactions at Daya Bay

Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, Y. C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng, X. Y. Ding , et al. (177 additional authors not shown)

Abstract: The full data set of the Daya Bay reactor neutrino experiment is used to probe the effect of the charged current non-standard interactions (CC-NSI) on neutrino oscillation experiments. Two different approaches are applied and constraints on the corresponding CC-NSI parameters are obtained with the neutrino flux taken from the Huber-Mueller model with a $5\%$ uncertainty. For the quantum mechanics-… ▽ More The full data set of the Daya Bay reactor neutrino experiment is used to probe the effect of the charged current non-standard interactions (CC-NSI) on neutrino oscillation experiments. Two different approaches are applied and constraints on the corresponding CC-NSI parameters are obtained with the neutrino flux taken from the Huber-Mueller model with a $5\%$ uncertainty. For the quantum mechanics-based approach (QM-NSI), the constraints on the CC-NSI parameters $ε_{eα}$ and $ε_{eα}^{s}$ are extracted with and without the assumption that the effects of the new physics are the same in the production and detection processes, respectively. The approach based on the weak effective field theory (WEFT-NSI) deals with four types of CC-NSI represented by the parameters $[\varepsilon_{X}]_{eα}$. For both approaches, the results for the CC-NSI parameters are shown for cases with various fixed values of the CC-NSI and the Dirac CP-violating phases, and when they are allowed to vary freely. We find that constraints on the QM-NSI parameters $ε_{eα}$ and $ε_{eα}^{s}$ from the Daya Bay experiment alone can reach the order $\mathcal{O}(0.01)$ for the former and $\mathcal{O}(0.1)$ for the latter, while for WEFT-NSI parameters $[\varepsilon_{X}]_{eα}$, we obtain $\mathcal{O}(0.1)$ for both cases. △ Less

Submitted 19 March, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: 25 pages, 16 figures, 6 tables; 36 pages, format changed, references added

arXiv:2401.00974 [pdf, other]

Downstream Task-Oriented Generative Model Selections on Synthetic Data Training for Fraud Detection Models

Authors: Yinan Cheng, Chi-Hua Wang, Vamsi K. Potluru, Tucker Balch, Guang Cheng

Abstract: Devising procedures for downstream task-oriented generative model selections is an unresolved problem of practical importance. Existing studies focused on the utility of a single family of generative models. They provided limited insights on how synthetic data practitioners select the best family generative models for synthetic training tasks given a specific combination of machine learning model… ▽ More Devising procedures for downstream task-oriented generative model selections is an unresolved problem of practical importance. Existing studies focused on the utility of a single family of generative models. They provided limited insights on how synthetic data practitioners select the best family generative models for synthetic training tasks given a specific combination of machine learning model class and performance metric. In this paper, we approach the downstream task-oriented generative model selections problem in the case of training fraud detection models and investigate the best practice given different combinations of model interpretability and model performance constraints. Our investigation supports that, while both Neural Network(NN)-based and Bayesian Network(BN)-based generative models are both good to complete synthetic training task under loose model interpretability constrain, the BN-based generative models is better than NN-based when synthetic training fraud detection model under strict model interpretability constrain. Our results provides practical guidance for machine learning practitioner who is interested in replacing their training dataset from real to synthetic, and shed lights on more general downstream task-oriented generative model selection problems. △ Less

Submitted 1 January, 2024; originally announced January 2024.

Comments: The following article has been accepted by ICAIF22, Synthetic Data for AI in Finance; see https://sites.google.com/view/icaif-synthetic-2022/program

arXiv:2401.00701 [pdf, other]

Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning

Authors: Kaibin Tian, Yanhua Cheng, Yi Liu, Xinglin Hou, Quan Chen, Han Li

Abstract: In recent years, text-to-video retrieval methods based on CLIP have experienced rapid development. The primary direction of evolution is to exploit the much wider gamut of visual and textual cues to achieve alignment. Concretely, those methods with impressive performance often design a heavy fusion block for sentence (words)-video (frames) interaction, regardless of the prohibitive computation com… ▽ More In recent years, text-to-video retrieval methods based on CLIP have experienced rapid development. The primary direction of evolution is to exploit the much wider gamut of visual and textual cues to achieve alignment. Concretely, those methods with impressive performance often design a heavy fusion block for sentence (words)-video (frames) interaction, regardless of the prohibitive computation complexity. Nevertheless, these approaches are not optimal in terms of feature utilization and retrieval efficiency. To address this issue, we adopt multi-granularity visual feature learning, ensuring the model's comprehensiveness in capturing visual content features spanning from abstract to detailed levels during the training phase. To better leverage the multi-granularity features, we devise a two-stage retrieval architecture in the retrieval phase. This solution ingeniously balances the coarse and fine granularity of retrieval content. Moreover, it also strikes a harmonious equilibrium between retrieval effectiveness and efficiency. Specifically, in training phase, we design a parameter-free text-gated interaction block (TIB) for fine-grained video representation learning and embed an extra Pearson Constraint to optimize cross-modal representation learning. In retrieval phase, we use coarse-grained video representations for fast recall of top-k candidates, which are then reranked by fine-grained video representations. Extensive experiments on four benchmarks demonstrate the efficiency and effectiveness. Notably, our method achieves comparable performance with the current state-of-the-art methods while being nearly 50 times faster. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2401.00625 [pdf, ps, other]

Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models

Authors: Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, Carl Yang, Yue Cheng, Liang Zhao

Abstract: The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated models like OpenAI's ChatGPT, represents a significant advancement in artificial intelligence. These models, however, bring forth substantial challenges in the high consumption of computational, memory, energy, and financial resources, especially in environments with limited resource capabilities. This survey aims t… ▽ More The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated models like OpenAI's ChatGPT, represents a significant advancement in artificial intelligence. These models, however, bring forth substantial challenges in the high consumption of computational, memory, energy, and financial resources, especially in environments with limited resource capabilities. This survey aims to systematically address these challenges by reviewing a broad spectrum of techniques designed to enhance the resource efficiency of LLMs. We categorize methods based on their optimization focus: computational, memory, energy, financial, and network resources and their applicability across various stages of an LLM's lifecycle, including architecture design, pretraining, finetuning, and system design. Additionally, the survey introduces a nuanced categorization of resource efficiency techniques by their specific resource types, which uncovers the intricate relationships and mappings between various resources and corresponding optimization techniques. A standardized set of evaluation metrics and datasets is also presented to facilitate consistent and fair comparisons across different models and techniques. By offering a comprehensive overview of the current sota and identifying open research avenues, this survey serves as a foundational reference for researchers and practitioners, aiding them in developing more sustainable and efficient LLMs in a rapidly evolving landscape. △ Less

Submitted 3 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

Comments: Preprint. GitHub repo: https://github.com/tiingweii-shii/Awesome-Resource-Efficient-LLM-Papers

arXiv:2401.00395 [pdf, other]

Energetic Variational Gaussian Process Regression for Computer Experiments

Authors: Lulu Kang, Yuanxing Cheng, Yiwei Wang, Chun Liu

Abstract: The Gaussian process (GP) regression model is a widely employed surrogate modeling technique for computer experiments, offering precise predictions and statistical inference for the computer simulators that generate experimental data. Estimation and inference for GP can be performed in both frequentist and Bayesian frameworks. In this chapter, we construct the GP model through variational inferenc… ▽ More The Gaussian process (GP) regression model is a widely employed surrogate modeling technique for computer experiments, offering precise predictions and statistical inference for the computer simulators that generate experimental data. Estimation and inference for GP can be performed in both frequentist and Bayesian frameworks. In this chapter, we construct the GP model through variational inference, particularly employing the recently introduced energetic variational inference method by Wang et al. (2021). Adhering to the GP model assumptions, we derive posterior distributions for its parameters. The energetic variational inference approach bridges the Bayesian sampling and optimization and enables approximation of the posterior distributions and identification of the posterior mode. By incorporating a normal prior on the mean component of the GP model, we also apply shrinkage estimation to the parameters, facilitating mean function variable selection. To showcase the effectiveness of our proposed GP model, we present results from three benchmark examples. △ Less

Submitted 1 April, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

Comments: 19 pages, 7 figures, 3 tables

arXiv:2401.00204 [pdf, other]

Electromagnetic Radiation from Binary Stars Mediated by Ultralight Scalar

Authors: Ya-Ze Cheng, Wen-Hao Wu, Yan Cao

Abstract: We present the electromagnetic (EM) dipole radiation flux from an eccentric Keplerian binary endowed with scalar charges, in the presence of scalar-photon coupling $φA_μA^μ$ or $φF_{μν}F^{μν}$. The scalar radiation is suppressed for orbital frequency below the scalar mass, while the scalar-mediated indirect EM radiation survives. We examine the constraints imposed on the scalar-photon and scalar-c… ▽ More We present the electromagnetic (EM) dipole radiation flux from an eccentric Keplerian binary endowed with scalar charges, in the presence of scalar-photon coupling $φA_μA^μ$ or $φF_{μν}F^{μν}$. The scalar radiation is suppressed for orbital frequency below the scalar mass, while the scalar-mediated indirect EM radiation survives. We examine the constraints imposed on the scalar-photon and scalar-charge couplings by the current observational data, in case that the scalar charge is given by the muon number. The extensions of our calculation to the quadrupole order and hyperbolic orbit are also discussed. △ Less

Submitted 1 July, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

Comments: 28 pages, 5 figures, 2 tables; errors corrected; revised discussions on the asymptotic limits of indirect radiation; adding results for the angular momentum flux associated with the dipole radiation of massive scalar/vector fields and the 1PN charged binary conservative dynamics; comments are welcome

arXiv:2401.00151 [pdf, other]

CamPro: Camera-based Anti-Facial Recognition

Authors: Wenjun Zhu, Yuan Sun, Jiani Liu, Yushi Cheng, Xiaoyu Ji, Wenyuan Xu

Abstract: The proliferation of images captured from millions of cameras and the advancement of facial recognition (FR) technology have made the abuse of FR a severe privacy threat. Existing works typically rely on obfuscation, synthesis, or adversarial examples to modify faces in images to achieve anti-facial recognition (AFR). However, the unmodified images captured by camera modules that contain sensitive… ▽ More The proliferation of images captured from millions of cameras and the advancement of facial recognition (FR) technology have made the abuse of FR a severe privacy threat. Existing works typically rely on obfuscation, synthesis, or adversarial examples to modify faces in images to achieve anti-facial recognition (AFR). However, the unmodified images captured by camera modules that contain sensitive personally identifiable information (PII) could still be leaked. In this paper, we propose a novel approach, CamPro, to capture inborn AFR images. CamPro enables well-packed commodity camera modules to produce images that contain little PII and yet still contain enough information to support other non-sensitive vision applications, such as person detection. Specifically, CamPro tunes the configuration setup inside the camera image signal processor (ISP), i.e., color correction matrix and gamma correction, to achieve AFR, and designs an image enhancer to keep the image quality for possible human viewers. We implemented and validated CamPro on a proof-of-concept camera, and our experiments demonstrate its effectiveness on ten state-of-the-art black-box FR models. The results show that CamPro images can significantly reduce face identification accuracy to 0.3\% while having little impact on the targeted non-sensitive vision application. Furthermore, we find that CamPro is resilient to adaptive attackers who have re-trained their FR models using images generated by CamPro, even with full knowledge of privacy-preserving ISP parameters. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: Accepted by NDSS Symposium 2024

arXiv:2401.00148 [pdf, other]

TPatch: A Triggered Physical Adversarial Patch

Authors: Wenjun Zhu, Xiaoyu Ji, Yushi Cheng, Shibo Zhang, Wenyuan Xu

Abstract: Autonomous vehicles increasingly utilize the vision-based perception module to acquire information about driving environments and detect obstacles. Correct detection and classification are important to ensure safe driving decisions. Existing works have demonstrated the feasibility of fooling the perception models such as object detectors and image classifiers with printed adversarial patches. Howe… ▽ More Autonomous vehicles increasingly utilize the vision-based perception module to acquire information about driving environments and detect obstacles. Correct detection and classification are important to ensure safe driving decisions. Existing works have demonstrated the feasibility of fooling the perception models such as object detectors and image classifiers with printed adversarial patches. However, most of them are indiscriminately offensive to every passing autonomous vehicle. In this paper, we propose TPatch, a physical adversarial patch triggered by acoustic signals. Unlike other adversarial patches, TPatch remains benign under normal circumstances but can be triggered to launch a hiding, creating or altering attack by a designed distortion introduced by signal injection attacks towards cameras. To avoid the suspicion of human drivers and make the attack practical and robust in the real world, we propose a content-based camouflage method and an attack robustness enhancement method to strengthen it. Evaluations with three object detectors, YOLO V3/V5 and Faster R-CNN, and eight image classifiers demonstrate the effectiveness of TPatch in both the simulation and the real world. We also discuss possible defenses at the sensor, algorithm, and system levels. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: Appeared in 32nd USENIX Security Symposium (USENIX Security 23)

arXiv:2312.17267 [pdf, other]

doi 10.1609/aaai.v38i16.29752

Enhancing Low-Resource Relation Representations through Multi-View Decoupling

Authors: Chenghao Fan, Wei Wei, Xiaoye Qu, Zhenyi Lu, Wenfeng Xie, Yu Cheng, Dangyang Chen

Abstract: Recently, prompt-tuning with pre-trained language models (PLMs) has demonstrated the significantly enhancing ability of relation extraction (RE) tasks. However, in low-resource scenarios, where the available training data is scarce, previous prompt-based methods may still perform poorly for prompt-based representation learning due to a superficial understanding of the relation. To this end, we hig… ▽ More Recently, prompt-tuning with pre-trained language models (PLMs) has demonstrated the significantly enhancing ability of relation extraction (RE) tasks. However, in low-resource scenarios, where the available training data is scarce, previous prompt-based methods may still perform poorly for prompt-based representation learning due to a superficial understanding of the relation. To this end, we highlight the importance of learning high-quality relation representation in low-resource scenarios for RE, and propose a novel prompt-based relation representation method, named MVRE (\underline{M}ulti-\underline{V}iew \underline{R}elation \underline{E}xtraction), to better leverage the capacity of PLMs to improve the performance of RE within the low-resource prompt-tuning paradigm. Specifically, MVRE decouples each relation into different perspectives to encompass multi-view relation representations for maximizing the likelihood during relation inference. Furthermore, we also design a Global-Local loss and a Dynamic-Initialization method for better alignment of the multi-view relation-representing virtual words, containing the semantics of relation labels during the optimization learning process and initialization. Extensive experiments on three benchmark datasets show that our method can achieve state-of-the-art in low-resource settings. △ Less

Submitted 29 May, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024

arXiv:2312.17115 [pdf, other]

How Far Are LLMs from Believable AI? A Benchmark for Evaluating the Believability of Human Behavior Simulation

Authors: Yang Xiao, Yi Cheng, Jinlan Fu, Jiashuo Wang, Wenjie Li, Pengfei Liu

Abstract: In recent years, AI has demonstrated remarkable capabilities in simulating human behaviors, particularly those implemented with large language models (LLMs). However, due to the lack of systematic evaluation of LLMs' simulated behaviors, the believability of LLMs among humans remains ambiguous, i.e., it is unclear which behaviors of LLMs are convincingly human-like and which need further improveme… ▽ More In recent years, AI has demonstrated remarkable capabilities in simulating human behaviors, particularly those implemented with large language models (LLMs). However, due to the lack of systematic evaluation of LLMs' simulated behaviors, the believability of LLMs among humans remains ambiguous, i.e., it is unclear which behaviors of LLMs are convincingly human-like and which need further improvements. In this work, we design SimulateBench to evaluate the believability of LLMs when simulating human behaviors. In specific, we evaluate the believability of LLMs based on two critical dimensions: 1) consistency: the extent to which LLMs can behave consistently with the given information of a human to simulate; and 2) robustness: the ability of LLMs' simulated behaviors to remain robust when faced with perturbations. SimulateBench includes 65 character profiles and a total of 8,400 questions to examine LLMs' simulated behaviors. Based on SimulateBench, we evaluate the performances of 10 widely used LLMs when simulating characters. The experimental results reveal that current LLMs struggle to align their behaviors with assigned characters and are vulnerable to perturbations in certain factors. △ Less

Submitted 15 June, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.16899 [pdf]

Anomalous exchange bias effect in ferromagnetic VI3 flakes

Authors: Xi Zhang, Xiuquan Xia, Qiye Liu, Yonggang He, Le Wang, Junhao Lin, Jia-Wei Mei, Yingchun Cheng, Jun-Feng Dai

Abstract: The exchange bias (EB) effect, pivotal in magnetic data storage and sensing devices, has been observed not only in interfacial regions but also in intrinsic ferromagnetic materials. Here, we've uncovered a robust and stable exchange bias effect within the layered van der Waals (vdW) ferromagnet VI3 employing magnetic circular dichroism microscopy. At 10 K, we observed a significant exchange field… ▽ More The exchange bias (EB) effect, pivotal in magnetic data storage and sensing devices, has been observed not only in interfacial regions but also in intrinsic ferromagnetic materials. Here, we've uncovered a robust and stable exchange bias effect within the layered van der Waals (vdW) ferromagnet VI3 employing magnetic circular dichroism microscopy. At 10 K, we observed a significant exchange field of approximately 0.1 T, accompanied by random shifts (positive or negative relative to zero magnetic field) after zero-field cooling. Notably, this effect is effectively controllable after field cooling, with shift direction opposing the applied magnetic field. The presence of strong magnetic anisotropic energy within VI3 results in larger coercivity-bound magnetic domains. These domains dictate the neighboring ferromagnetic alignment and induce shifts in the hysteresis loop. Our study not only contributes to comprehending fundamental nanoscale magnetic interactions but also sheds light on emergent phenomena within layered van der Waals magnets. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.15746 [pdf, other]

Large Language Models are Not Stable Recommender Systems

Authors: Tianhui Ma, Yuan Cheng, Hengshu Zhu, Hui Xiong

Abstract: With the significant successes of large language models (LLMs) in many natural language processing tasks, there is growing interest among researchers in exploring LLMs for novel recommender systems. However, we have observed that directly using LLMs as a recommender system is usually unstable due to its inherent position bias. To this end, we introduce exploratory research and find consistent patt… ▽ More With the significant successes of large language models (LLMs) in many natural language processing tasks, there is growing interest among researchers in exploring LLMs for novel recommender systems. However, we have observed that directly using LLMs as a recommender system is usually unstable due to its inherent position bias. To this end, we introduce exploratory research and find consistent patterns of positional bias in LLMs that influence the performance of recommendation across a range of scenarios. Then, we propose a Bayesian probabilistic framework, STELLA (Stable LLM for Recommendation), which involves a two-stage pipeline. During the first probing stage, we identify patterns in a transition matrix using a probing detection dataset. And in the second recommendation stage, a Bayesian strategy is employed to adjust the biased output of LLMs with an entropy indicator. Therefore, our framework can capitalize on existing pattern information to calibrate instability of LLMs, and enhance recommendation performance. Finally, extensive experiments clearly validate the effectiveness of our framework. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2312.15637 [pdf, other]

doi 10.1016/j.physletb.2024.138735

Thermal Relic Right-Handed Neutrino Dark Matter

Authors: Yu Cheng, Jie Sheng, Tsutomu T. Yanagida

Abstract: It is known that two heavy Majorana right-handed neutrinos are sufficient to generate the baryon asymmetry in the present universe. Thus, it is interesting to identify the third right-handed neutrino $N$ with the dark matter. We impose a new discrete symmetry $Z_2$ on this dark matter neutrino to stabilize it. However, the $U(1)_{B-L}$ gauge boson $A'$ couples to the right-handed neutrino $N$. If… ▽ More It is known that two heavy Majorana right-handed neutrinos are sufficient to generate the baryon asymmetry in the present universe. Thus, it is interesting to identify the third right-handed neutrino $N$ with the dark matter. We impose a new discrete symmetry $Z_2$ on this dark matter neutrino to stabilize it. However, the $U(1)_{B-L}$ gauge boson $A'$ couples to the right-handed neutrino $N$. If the $B-L$ breaking scale $V_{B-L}$ is sufficiently low, the dark matter neutrino $N$ can be in the thermal bath. We find that the thermal relic $N$ can explain the dark matter abundance for the $B-L$ breaking scale $ V_{B-L} \sim O(10)\,$TeV. After considering all the constraints from the existing experiments, a narrow mass region of the thermal produced right-handed neutrino dark matter $N$ is still surviving. △ Less

Submitted 21 May, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

Comments: 6 pages, 2 figures

Showing 201–250 of 1,683 results for author: Cheng, Y